Closed reynir closed 9 months ago
So, what are the overall expectations?
But, there's no persistence of dropped platforms (should there be some)? So, a restart of the builder-server and a connecting builder-worker will execute such jobs again -- is this fine? do we want to persist the list of dropped platforms? (I think I'm in favour thereof). Also, I'd like the find_job to figure that the platform is dropped (and close the worker connection)..
it's not clear to me what a good way forward is. It is not nice to leave workers hanging. Closing the connection or terminating the worker is nice except we usually put our workers in an infinite loop involving installing a lot of packages; so that is not desirable.
Maybe persisting dropped platforms and refusing to drop a platform we have workers connecting thereby forcing a flow where the workers are first shut down. Then we still have the corner case where a worker is "rebooting" while the platform is dropped. Hmm.
Oh, another path is server-only: persist the dropped platforms, and just don't ever give such a worker a job. Thus the worker will be there and alive, but not receiving anything. IMHO that would be fine.
I added a commit to persist dropped platforms. Then I realized we might want to be able to "undrop" a platform. I have not yet added the command in the client, and I need to bump a version number somewhere. I think the logic is not so easy to follow now.
Which logic is not so easy to follow? The list of dropped platforms should also be inspectable (i.e. printed in the info output or such).
fine to merge when CI is happy
When dropping a platform it is best to first disable the relevant workers. Otherwise the workers will eventually recreate the platform.
Fixes #38