oxidecomputer / buildomat

a software build labour-saving device
Mozilla Public License 2.0
53 stars 2 forks source link

return 404 errors when a file is not found in S3 #48

Closed hawkw closed 6 months ago

hawkw commented 7 months ago

When handling a GET /public/file/{username}/{series}/{version}/{name} request, Buildomat queries the database to determine the file path for the provided repo, series, version, and filename. If these queries don't find a file, the endpoint returns an HTTP 404 error. However, once the file path has been determined from the database, Buildomat will then attempt to find the actual file on the filesystem, or (if it isn't found locally) from S3. If the attempt to actually load the file from the local fs/S3 doesn't find a file, the Central::file_response method returns an error, which is always converted into an HTTP 500 response rather than a 404. It turns out that this case occurs when a build is in progress for a particular artifact, resulting in clients seeing 500 errors for nonexistent files.

I've attempted to fix this issue by changing Central::file_response to return a Result<Option<FileResponse>> rather than a Result<FileResponse>. We now return Ok(None) in cases where the file for the requested path doesn't exist on the local filesystem or in S3, and reserve Err for cases where the file is present but we actually couldn't open it for whatever reason.

Hopefully, this will result in 404s being returned for nonexistent files, rather than 500s. I wasn't sure how to verify that this completely resolves the issue, though --- any guidance would be very helpful. Thanks!

jclulow commented 6 months ago

Hey, I've been looking at this one a bit and I think the crux of the issue is actually elsewhere. I have attempted to draw a picture of the rough flow here:

image

In particular, we were already doing basically the right thing in the buildomat core API (server/ in the repository): returning a 404 in the case that such a file has not been uploaded and published, but returning a 500 error in the case that we thought it was uploaded (according to the SQLite database) but somehow it's missing from both S3 and the local server file system. This latter case is a pretty serious error because it means we have almost certainly flubbed uploading a file, or lost one after the fact, and I think needs to remain as a 500.

I've made some changes to the GitHub integration server (github/server/, etc, in the repository) to actually pass through the 404 error we get in this case. I don't think progenitor was able to expose this level of detail when I originally put a lot of this together, but Adam has since enhanced it to provide an error object that includes the status code and/or other information, so we can pass the 404 on to the client.

I've also produced a HTML-formatted 404 and 500 page for endpoints people might click on in the browser (which includes this published file endpoint, and a few others). Endpoints for software (like webhook delivery) are still JSON-formatted dropshot errors.

404

500

I have put this back as a96ed9cc949bb5e41e67a5a496fc267249b903ff. Let me know if it appears to work as you'd expect! Note that this doesn't yet help with the issue described in #46, as we still don't yet know if a file may exist in the future -- only that one does not yet exist.