Handle disk space errors

malomarrec commented 2 years ago

When trying to start a new job, executors do not check to see if they have the disk space available to process the job. This has affected customers.

It just occurred to me that the disk space issue could be handled differently by the executors, since It could be a temporary problem. Assuming the executor could detect it is out of disk space, it could skip processing new jobs until enough space is available. The other consideration I was thinking about (around Batch Changes error handling) is whether the error is meant for the end-user running Batch Changes or for a Sourcegraph admin. I imagine where we are, we cannot differentiate between the two, but as the feature evolves its probably a good idea to think about how to handle errors that can only be acted upon by admins.

From: https://github.com/sourcegraph/accounts/issues/565

Done

Prior to processing a job (in preDequeue), check if there is disk space available
- If there is no disk space, do not process the job (this is not an error)

Technical Consideration

We do not want to fail the job if there is no disk space, just skip the job until we can process a job
Be good to document when an executor frees up disk space (can be in the code)

malomarrec commented 2 years ago

I recommend backlogging this for now.

eseliger commented 2 years ago

Should be possible to say in PreDequeue that we want to make sure enough free disk space is available.

sourcegraph / sourcegraph-public-snapshot

Handle disk space errors #37456

Done

Technical Consideration