oxidecomputer / buildomat

a software build labour-saving device
Mozilla Public License 2.0
53 stars 2 forks source link

after the job program has executed, report lingering open stdio file descriptors and fail job #12

Closed jclulow closed 1 year ago

jclulow commented 1 year ago

Under some conditions the job program might exit, such as if it has decided it has reached a fatal condition or hit a self-imposed timeout. Unfortunately, the child process exiting is not presently enough to end the buildomat job; I believe we will also wait to read everything we can read from the stdout and stderr descriptors, which may be shared with and then held open by background processes.

When the child process exits, we should start a timer (say, 30-60 seconds) and abandon the file descriptors at that point even if we have not hit EOF. This condition should at least result in a warning message in the job event stream, and probably also fail the job.

jclulow commented 1 year ago

I decided not to make it an error for now, just a warning. We'll wait ~5 seconds for the descriptors to drain once the child process exits, in an attempt to print any remaining output prior to rendering the exit status. Then, if the descriptors were not closed after that delay we'll report a warning to let the user know we're waiting another 60 seconds. If the descriptors are still not closed, we report which (stderr or stdout or both) are still open and end the task. File uploads will then happen as normal, etc.

This was implemented in 94e05c403f04e6c2e99f947c0e37c0c8a84cc304, which has now been deployed as well.