0-OS auto-update - Githubissues

zaibon commented 5 years ago

The goal is to be able to update 0-core without the need of a reboot. Idea how to get there was:

make 0-core to NOT be the pid 1 of the system
create a new pid 1 that is just responsible to keep 0-core running (@maxux started working on https://github.com/maxux/0-init)
design an upgrade procedure where 0-OS just download the new version 0-core and restart it.

muhamadazmy commented 5 years ago

I think this approach is gonna be a bit more complex than this. One of the important responsibilities of PID 1 is to consume exit codes of orphan processes. which means on restart of core0 process, PID 1 will be the direct parent of all the system services (redis, libvirtd, etc...) which means it will also need to make sure these processes are working properly and handle there logs, etc ...

So a slight change on your suggestion, i suggest we support 2 modes for core0

core0 init mode, this will be a very light weight layer, that's sole responsibility is to bootstrap the machine and run (and maintain) system services. It doesn't provide an API and you can't talk to it.
core0 api mode, this is the one that process commands, and maintain all the subsystems (containers, kvm, access) etc...

If an update is available, the core0 binary is gonna be redownloaded, and the core0 api process is restarted to run new api code.

We will still have a problem with containers thought that we need to figure out, since the communication between core0 and coreX processes are done via pipes (man 2 pipe) that are only available in the kernel, if one end of the pipe (core0, is closed) coreX will faile with (Borken Pipe) error. We can try named pipes although this might not be possible since coreX runns in a different mount namespace, so i will need to experment a little bit with this.

delandtj commented 5 years ago

containerd runs a shim in the docker-ce versions that maintain container sockets through which they can communicate, ans as such you can restart dockerd (core0 in our pov) whithout losing the containers in the process

delandtj commented 5 years ago

nvm... i'm full of crap... restarting docker stops the containers too

muhamadazmy commented 5 years ago

There are so many points that we need to discuss regarding this issue.To be honest, i thought the update will be a v2 features, hence the work on this for v1.5 was discontinued.

Anyway, we need to discuss how this should be really done in v1.5, we need to take care the following points:

Parent child relationship of processes. In the current model, core0 is the father of all services (including long running services) this allow it to take action if a service crashes. and also read it's output streams (via pipes) to process messages in zos format.
On core0 restart, those services will get inherited by PID1, and there is no way we can't get their streams back (not that i know of, except may be read the fd from the /proc) but we won't get signaled if a process exits, hence no way to take action or return the exit result!.
Same applies for containers, except for containers it's more complicated, since we have more 2 pipes attached to the coreX process to forward and receive commands. so there is no way we can restart core0 without killing all the containers.
We don't have the same issues with KVM, since kvms are managed by libvirt, so restarting core0 will not kill the machines, but we will have to find a way to maintain some state we have in memory (shouldn't be a problem, in fact we might find another way to drop that state altogether)

Suggested solutions to some of the problems

Processes, instead of being attached directly to core0 they can be instead piped to a helper (a lightweight helper for each process) that do processing on the data and then pipe the streams directly to redis, then core0 can be just a reader on a redis queue to process the streams and receive the messages. This way child processes, or services don't have to be even a child of core0, they can just be direct children of PID 1, and core0 can restart independently.
Containers should be managed by a separate daemon (like libvirt for kvm) this way, it will be possible to restart the API part of core0 without losing the containers. It also means updating this daemon will still require losing all the containers!

Conclusion: Splitting core0 like that is not gonna be a trivial task, i think it would be better if we start on v2 immediately now, since lots of parts gonna be rewritten anyway!

maxux commented 5 years ago

One of the important responsibilities of PID 1 is to consume exit codes of orphan processes. which means on restart of core0 process, PID 1 will be the direct parent of all the system services (redis, libvirtd, etc...) which means it will also need to make sure these processes are working properly and handle there logs, etc ...

I don't see problem about that, as soon as PID 1 knows whatr to do, it could take care of management, which is already the case for core0 itself btw (ensure core0 restart if it crash).

We will still have a problem with containers thought that we need to figure out, since the communication between core0 and coreX processes are done via pipes (man 2 pipe) that are only available in the kernel, if one end of the pipe (core0, is closed) coreX will faile with (Borken Pipe) error. We can try named pipes although this might not be possible since coreX runns in a different mount namespace, so i will need to experment a little bit with this.

Maybe we can use unix socket instead of pipes for that ?

Processes, instead of being attached directly to core0 they can be instead piped to a helper (a lightweight helper for each process) that do processing on the data and then pipe the streams directly to redis, then core0 can be just a reader on a redis queue to process the streams and receive the messages. This way child processes, or services don't have to be even a child of core0, they can just be direct children of PID 1, and core0 can restart independently.

I don't think starting a helper (I read process) per new process will be good in a resources point of view :/

muhamadazmy commented 5 years ago

@maxux

I don't see problem about that, as soon as PID 1 knows whatr to do, it could take care of management, which is already the case for core0 itself btw (ensure core0 restart if it crash).

Yeah, i am not saying there is a problem, I am just clearing that PID 1 need to do what core0 doesn now to bootstrap the system and start system services, and provide monitoring for those services. It means we will still run core0 as PID 1 in a (init) mode, and then start another service to handle connections and API calls.

Maybe we can use unix socket instead of pipes for that ?

Unix sockets were used in the very early version, but I dropped it in fever of pipes because coreX (container) starts in a different mount namespace, so i had to mount-bind the unix socket inside the container mount namespace, it also meant other processes inside the container can see and connect to this socket. Pipes on the other hand are direct to the process, and can't be intercepted.

If we gonna use unix sockets again, we have to make sure it's very secured in a way malicious processes inside the container can't use it.

I don't think starting a helper (I read process) per new process will be good in a resources point of view :/

Yes, i totally agree on that, but we need to make sure processes outputs are processed regarding if u holding the process streams or not. We can put more thought into it. I am trying to avoid writing this to a file. may be we can use syslogd ? or something similar.

muhamadazmy commented 5 years ago

In PID1 (core0)

The following items is probably impossible to update in runtime:

The process manager code, this includes:
- Extensions (plugins) interface
- System api call (to execute binaries)
- Communication protocol (commands, return, streams, etc...)
- Jobs interface
- Stats aggregation
- Logging

Parts of the API that can be updated

There parts of the api doesn't keep in memory state, so they can be reloaded if needed

bridge
btrfs
disk
fs
info
ip
job (the api part only for listing and signaling, the job struct is built in the process manager)
monitor (the query api)
power (interface to process manager, built in shutdown)
pprof
process
web
cgroup
nft
zfs

Parts that can't be updated with current code

there components keep an in memory status, it requires work (not a quick task) to drop the state or move it to disk

containers
kvm
socat (reserved ports)

zaibon commented 5 years ago

won't fix in this version

threefoldtecharchive / 0-core

0-OS auto-update #149

Suggested solutions to some of the problems

The following items is probably impossible to update in runtime:

Parts of the API that can be updated

Parts that can't be updated with current code