Better error handling - Githubissues

q-m / scrapyd-k8s

Scrapyd on container infrastructure

MIT License

12 stars 7 forks source link

Better error handling #2

Open wvengen opened 1 year ago

wvengen commented 1 year ago

A small number of error-cases is handled (i.e. an error API response is returned). Add a default error handler that returns what scrapyd would return on an error. And handle more cases with helpful error messages.

wvengen commented 9 months ago

Things to look at (not an exhaustive list):

missing query / form parameters in API request (some are handled, some not) (done)
invalid query / form parameters in API request (e.g. unknown project, unknown spider, unknown version) (done)
missing config file entries (e.g. project repository)
project repository that does not exist (done)
project repository that cannot be contacted (for a remote repository)
Kubernetes connection error (and other errors regarding connecting the cluster)
Docker connection error (and other errors regarding connecting to the Docker daemon)

wvengen commented 9 months ago

Here is the scrapyd code for the API, having several error responses. https://github.com/scrapy/scrapyd/blob/master/scrapyd/webservice.py

wvengen commented 8 months ago

Note that if you're working on this with a team, it can make sense to split this issue up in multiple smaller ones.

wvengen commented 8 months ago

Did some tests on the original scrapyd, but error handling is not very well worked out:

when requesting a non-existant endpoint, 404 status with generic HTML error page is returned
when an error message is returned, status is still 200

listspiders on a nonexistant project returns status 200 with content

{"node_name": "x", "status": "error", "message": "Scrapy 2.9.0 - no active project\n\nUnknown command: list\n\nUse \"scrapy\" to see available commands\n"}

I think we can follow that. We don't need to have the exact error messages as scrapyd (they can be quite cryptic).

wvengen commented 8 months ago

Errors related to invalid query and form parameters have been (mostly?) fixed. You can still schedule a project with an non-existant version, which means it will be stuck in the pending state. You can still schedule a project with an non-existant spider, which means it will error immediately and be finished.

wvengen commented 8 months ago

A next step could be to handle exceptions and return a JSON-based error message, instead of http 500 server error (this may break clients). Ideally, this only happens for the API endpoints.

The main config-related error (where the daemon doesn't crash before starting) is when a project repository is missing in the configuration. That would be something to improve, either on load or when it is being used.