google / gvisor

Application Kernel for Containers
https://gvisor.dev
Apache License 2.0
15.85k stars 1.3k forks source link

Checkpoint/Restore with multi-container #1956

Open fvoznika opened 4 years ago

fvoznika commented 4 years ago

Checkpoint/Restore (aka Save/Restore) is only supported for a single container.

There are a few things required to enable multi-container from the top of my head:

zkoopmans commented 4 years ago

IIRC, our partners at ANT were working on this, but I can't recall who. @tanjianfeng?

aaronlu commented 4 years ago

Jianfeng is not working on this right now, I and some others will be working on this. The problem is, I'm new to both golang and gVisor so any help is appreciated. From my limited understanding, I agree with item 1 and 4, i.e. save/restore has to happen for the entire sandbox(pod). Right now I don't quite understand item 2 and 3, I'll need to learn more about how gVisor works.

aaronlu commented 4 years ago

Hi good news, we have kind of made multicontainers restore work, but the patches are in preliminary stage and we are still working on testing and cleaning things up. Will send them out once we have more confidence on it, but let me know if you are interested in an early stage review, thanks.

fvoznika commented 4 years ago

That's very cool!! I'm interested in early stages review if you can point me to the right direction.

aaronlu commented 4 years ago

Bear me some time to do rebase/refactor, expect a git branch ready for early stage review sometime next week, thanks!

aaronlu commented 4 years ago

The branch is ready at: https://github.com/aaronlu/gvisor multicontainer Feel free to let me know what you think, thanks.

It passed a simple test of: starting 3 bash containers(one root container + two child containers) and then restoring them. After restore, all three containers can accpet command and runsc can send signal to pid 1 of each child container and the child containers are destroyed after the signal.

twavv commented 3 years ago

@fvoznika Did you ever get a chance to look at @aaronlu's branch? It's a few (...thousand) commits behind master now, but this use case is super interesting to me so I could potentially revive (not sure if a single PR makes sense since it's so big?) it if that looks like a halfway decent approach.