Closed wschoot closed 7 months ago
Have you checked whether the website gives invalid Content-length
headers? Or if it's just a temporary situation under load? We could have a separate ignore_incomplete_reads: true
kind of configuration. Want to make a PR? :)
I was unable to manually test this as it only happens sometimes. I didn't yet put the effort in to make a cronjob for curl and save the output to be able to retrace the calls. I'm not too comfortable with python so making PR's is not my forte I'm afraid :)
A minimal reproducer that serves incomplete HTTP chunks: https://gist.github.com/wfrisch/bc00bfa049f2aab76dbb73215b1f5bb5
I have regularly observed the same problem in the wild here: https://www.mozilla.org/en-US/security/advisories/
("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))
Adding this option was straightforward. Feature branch: https://github.com/wfrisch/urlwatch/tree/feat/ignore_incomplete_reads
Steps to reproduce:
Run ./http-serve-incomplete-chunks.py
(https://gist.github.com/wfrisch/bc00bfa049f2aab76dbb73215b1f5bb5)
Before: urls.yaml:
name: "incomplete-chunk-server"
url: "http://localhost:8080"
urlwatch --urls urls.yaml
[...]
("Connection broken: InvalidChunkLength(got length b'\\r\\n', 0 bytes read)", InvalidChunkLength(got length b'\r\n', 0 bytes read))
After: urls.yaml:
name: "incomplete-chunk-server"
url: "http://localhost:8080"
ignore_incomplete_reads: true
./urlwatch --urls urls.yaml
→ exit code 0
An improved reproducer now also emulates regular incomplete reads (wrong Content-Length), as requested in the first comment: https://gist.github.com/wfrisch/63d1163645fa01e3ab1296e752769359
cat urls.yaml
url: "http://localhost:8080/invalid-content-length"
# ignore_incomplete_reads: true
---
url: "http://localhost:8080/invalid-chunk-length"
# ignore_incomplete_reads: true
urlwatch --urls.yaml
===========================================================================
01. ERROR: http://localhost:8080/invalid-content-length
02. ERROR: http://localhost:8080/invalid-chunk-length
===========================================================================
---------------------------------------------------------------------------
ERROR: http://localhost:8080/invalid-content-length
---------------------------------------------------------------------------
('Connection broken: IncompleteRead(13 bytes read, 10 more expected)', IncompleteRead(13 bytes read, 10 more expected))
---------------------------------------------------------------------------
---------------------------------------------------------------------------
ERROR: http://localhost:8080/invalid-chunk-length
---------------------------------------------------------------------------
("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))
---------------------------------------------------------------------------
The new option silences both errors.
I'm tracking a website that sometimes gives me an errormessage like:
The configuration I'm using includes the following statements that seem to have no effect on this particular error:
I've also tried "treating" it as a timeout, by setting a stricter timeout and ignoring timeout errors like so:
But it doesn't really help anything. What else can I try? This is urlwatch v2.25 on Linux