robur-coop / albatross

Albatross: orchestrate and manage MirageOS unikernels with Solo5
ISC License
142 stars 17 forks source link

Implement killall waiting #50

Closed reynir closed 3 years ago

hannesm commented 3 years ago

this should fix #37 (or am I misguided?)

reynir commented 3 years ago

Yes, this should ensure Vmm_vmmd.handle_shutdown is called which calls Vmm_unix.free_system_resources.

reynir commented 3 years ago

One concern I have is Vmm_vmmd.register will overwrite any existing waiter. It's unclear to me if registering in Vmm_vmmd.killall may overwrite something important, or if there's a situation where something could overwrite our waiter after the Vmm_vmmd.killall call.

reynir commented 3 years ago

I did a test on my machine creating 92 unikernels and then shutting down albatrossd using this branch. It took over 18 seconds to shut down. After the shutdown there were no vmmtaps lingering.

$ time sudo systemctl stop albatross_daemon.service 

real    0m18.202s
user    0m0.003s
sys 0m0.060s

I think it's reasonable to say it fixes #37.

(my router was not too happy getting 90 dhcp clients all of a sudden)

hannesm commented 3 years ago

One concern I have is Vmm_vmmd.register will overwrite any existing waiter. It's unclear to me if registering in Vmm_vmmd.killall may overwrite something important,

The only waiter to be registered is the restart-on-failure one. So indeed before destroying all unikernels we should overwrite all waiters :) (otherwise the restart-on-failure task may jump in and restart a unikernel).

or if there's a situation where something could overwrite our waiter after the Vmm_vmmd.killall call.

This should not happen, as the only thing registering is in albatrossd ("if Unikernel.restart_handler config then Vmm_vmmd.register_restart ...") just after a unikernel was successfully started.

hannesm commented 3 years ago

thanks, I pushed a commit to first shut down the server socket, and then to destroy the unikernels.... I think this is good to go (let's wait for CI though) -- and please comment if you see any problems with the revised shutdown order..

reynir commented 3 years ago

I think it makes more sense to close the socket first, yes.