Closed benoit74 closed 1 month ago
Upstream confirmed this was indeed a bug, which has been fixed. So we should definitely not consider a status code 0 as normal. I will close the PR without merging it, and create another one to just better log issues of unexpected status codes. Currently we do not make a difference between unprocessable status code (which are "normal", e.g. 404) and unexpected status codes (which are "abnormal", e.g. 0, invalid status codes, ...). Both are logged only in DEBUG log level. This is probably fine for unprocessable status code, but unexpected status codes should be logged at least in WARNING log level since they are not expected.
When implementing https://github.com/openzim/warc2zim/issues/220, we considered that HTTP status code 0 is not processable. We even had to manually edit a WARC used in the test set to alter its HTTP status code which was 0 (we considered it was an old bug).
This in fact created a regression in warc2zim, i.e. WARC record with status code 0 are not that unusual / still produced by the crawler.
See https://github.com/webrecorder/browsertrix-crawler/issues/570 for a discussion on this topic.
Until things get clear on crawler side, we obviously should consider that HTTP status code 0 is equivalent to HTTP status 200.