threefoldtecharchive / 0-core

Multi Node OS which requires no install.
Apache License 2.0
3 stars 2 forks source link

0-OS auto-update #149

Closed zaibon closed 5 years ago

zaibon commented 5 years ago

The goal is to be able to update 0-core without the need of a reboot. Idea how to get there was:

muhamadazmy commented 5 years ago

I think this approach is gonna be a bit more complex than this. One of the important responsibilities of PID 1 is to consume exit codes of orphan processes. which means on restart of core0 process, PID 1 will be the direct parent of all the system services (redis, libvirtd, etc...) which means it will also need to make sure these processes are working properly and handle there logs, etc ...

So a slight change on your suggestion, i suggest we support 2 modes for core0

If an update is available, the core0 binary is gonna be redownloaded, and the core0 api process is restarted to run new api code.

We will still have a problem with containers thought that we need to figure out, since the communication between core0 and coreX processes are done via pipes (man 2 pipe) that are only available in the kernel, if one end of the pipe (core0, is closed) coreX will faile with (Borken Pipe) error. We can try named pipes although this might not be possible since coreX runns in a different mount namespace, so i will need to experment a little bit with this.

delandtj commented 5 years ago

containerd runs a shim in the docker-ce versions that maintain container sockets through which they can communicate, ans as such you can restart dockerd (core0 in our pov) whithout losing the containers in the process

delandtj commented 5 years ago

nvm... i'm full of crap... restarting docker stops the containers too

muhamadazmy commented 5 years ago

There are so many points that we need to discuss regarding this issue.To be honest, i thought the update will be a v2 features, hence the work on this for v1.5 was discontinued.

Anyway, we need to discuss how this should be really done in v1.5, we need to take care the following points:

Suggested solutions to some of the problems

Conclusion: Splitting core0 like that is not gonna be a trivial task, i think it would be better if we start on v2 immediately now, since lots of parts gonna be rewritten anyway!

maxux commented 5 years ago

One of the important responsibilities of PID 1 is to consume exit codes of orphan processes. which means on restart of core0 process, PID 1 will be the direct parent of all the system services (redis, libvirtd, etc...) which means it will also need to make sure these processes are working properly and handle there logs, etc ...

I don't see problem about that, as soon as PID 1 knows whatr to do, it could take care of management, which is already the case for core0 itself btw (ensure core0 restart if it crash).

We will still have a problem with containers thought that we need to figure out, since the communication between core0 and coreX processes are done via pipes (man 2 pipe) that are only available in the kernel, if one end of the pipe (core0, is closed) coreX will faile with (Borken Pipe) error. We can try named pipes although this might not be possible since coreX runns in a different mount namespace, so i will need to experment a little bit with this.

Maybe we can use unix socket instead of pipes for that ?

Processes, instead of being attached directly to core0 they can be instead piped to a helper (a lightweight helper for each process) that do processing on the data and then pipe the streams directly to redis, then core0 can be just a reader on a redis queue to process the streams and receive the messages. This way child processes, or services don't have to be even a child of core0, they can just be direct children of PID 1, and core0 can restart independently.

I don't think starting a helper (I read process) per new process will be good in a resources point of view :/

muhamadazmy commented 5 years ago

@maxux

I don't see problem about that, as soon as PID 1 knows whatr to do, it could take care of management, which is already the case for core0 itself btw (ensure core0 restart if it crash).

Yeah, i am not saying there is a problem, I am just clearing that PID 1 need to do what core0 doesn now to bootstrap the system and start system services, and provide monitoring for those services. It means we will still run core0 as PID 1 in a (init) mode, and then start another service to handle connections and API calls.

Maybe we can use unix socket instead of pipes for that ?

Unix sockets were used in the very early version, but I dropped it in fever of pipes because coreX (container) starts in a different mount namespace, so i had to mount-bind the unix socket inside the container mount namespace, it also meant other processes inside the container can see and connect to this socket. Pipes on the other hand are direct to the process, and can't be intercepted.

If we gonna use unix sockets again, we have to make sure it's very secured in a way malicious processes inside the container can't use it.

I don't think starting a helper (I read process) per new process will be good in a resources point of view :/

Yes, i totally agree on that, but we need to make sure processes outputs are processed regarding if u holding the process streams or not. We can put more thought into it. I am trying to avoid writing this to a file. may be we can use syslogd ? or something similar.

muhamadazmy commented 5 years ago

In PID1 (core0)

The following items is probably impossible to update in runtime:

Parts of the API that can be updated

There parts of the api doesn't keep in memory state, so they can be reloaded if needed

Parts that can't be updated with current code

there components keep an in memory status, it requires work (not a quick task) to drop the state or move it to disk

zaibon commented 5 years ago

won't fix in this version