restic / rest-server

Rest Server is a high performance HTTP server that implements restic's REST backend API.
BSD 2-Clause "Simplified" License
922 stars 138 forks source link

Return "internal server error" if files cannot be read #195

Closed MichaelEischer closed 1 year ago

MichaelEischer commented 1 year ago

What is the purpose of this change? What does it change?

When files in a repository cannot be read by rest-server, for example after running restic prune directly on the server hosting the repositories, then rest-server returned "Not Found" as status code. This is extremely confusing for users.

The PR fixes the error handling when files cannot be accessed. If the returned error is fs.ErrNotExist then it is turned into a "Not Found" status code, all other errors now result in an "Internal Error" together with a debug log output.

Was the change discussed in an issue or in the forum before?

Fixes https://github.com/restic/restic/issues/1871

Checklist

rawtaz commented 1 year ago

To be honest, I find that yielding an "Internal server error" code is quite a mismatch when in fact there is nothing wrong with the actual server itself. A "Permission denied" error seems more accurate, isn't that a better return value when a file cannot be read?

I do understand that one could argue that the servers is misconfigured if the files can't be read, but it's still kind of far-fetched to call that an internal server error. It's more of a storage error, denying you access.

wojas commented 1 year ago

A 500 Internal Server Error is a temporary error. The client can expect that the problem will be resolved automatically, or manually by the operator and retry at a later time. In the case of incorrect file permissions, it is appropriate. It better reflects the situation than a 404: the file is still there, but due to server misconfiguration (running operations with the wrong user) it is currently not accessible, while it should be.

It would be even better to pass the actual error to the client, instead of just a 500. There is nothing that disallows making 500 page contents useful.

MichaelEischer commented 1 year ago

To be honest, I find that yielding an "Internal server error" code is quite a mismatch when in fact there is nothing wrong with the actual server itself. A "Permission denied" error seems more accurate, isn't that a better return value when a file cannot be read?

Forbidden (Status code 403) belongs to the category of client-side errors, which are normally fixed by having the client do something else. It says that the client doesn't have to enough permissions to access the file. But that isn't the case here at all. We have a server-side problem (5xx status codes), which can only be fixed on the server-side. I have looked at the other 5xx status codes, but these aren't a good match either.

It would be even better to pass the actual error to the client, instead of just a 500. There is nothing that disallows making 500 page contents useful.

Simply taking the returned error would start leaking the storage location of the repository at the server and maybe other information to the user. When running the server for myself that's not a problem, but in other setups it can be. Restic would also have to be changed to print the error message to the user. But in the end you'll have to log into the server and fix the problem there. So, I think we should stick to the current approach and just log the error at the server.

MichaelEischer commented 1 year ago

There's one more race condition related to request retries left (besides the delete one): if an upload for an append-only repository succeeds without the client learning about that fact, this will cause later retries to fail. It would be possible for the rest-server to accept a reupload iff that upload has the expected file hash.

rawtaz commented 1 year ago

Forbidden (Status code 403) belongs to the category of client-side errors, which are normally fixed by having the client do something else. It says that the client doesn't have to enough permissions to access the file. But that isn't the case here at all. We have a server-side problem (5xx status codes), which can only be fixed on the server-side. I have looked at the other 5xx status codes, but these aren't a good match either.

I agree with you, realized this the other day. Not much else to do than to use the 500 :/

wojas commented 1 year ago

There's one more race condition related to request retries left (besides the delete one): if an upload for an append-only repository succeeds without the client learning about that fact, this will cause later retries to fail. It would be possible for the rest-server to accept a reupload iff that upload has the expected file hash.

I agree.

Another option would be to return 409 Conflict and handle this in a newer restic version as a success. This would avoid an unnecessary upload, but complicate error handling.

MichaelEischer commented 1 year ago

Another option would be to return 409 Conflict and handle this in a newer restic version as a success. This would avoid an unnecessary upload, but complicate error handling.

Avoiding a reupload for this rare error case is probably not worth introducing any complexity. Handling retries purely at the server-side would also be more consistent with the way retried delete calls are handled. But let's discuss that in a separate issue.

@wojas Any objections to merging this PR as is?

MichaelEischer commented 1 year ago

@wojas Ping?

wojas commented 1 year ago

No objections, LGTM!