Closed Ralle closed 6 years ago
cURL is probably more lenient. Can you provide the actual URL that is problematic?
FWIW you should be able to catch that error by adding an 'error'
handler on res
IIRC.
This is the URL, but you have to open it twice because the first view is an anti-hotlinking measure. http://www.wc3c.net/attachment.php?attachmentid=33111&d=1241580637
The measure is the bblastvisit cookie being a unix timestamp less than 120 seconds old.
The error seems to be _HPE_INVALIDCONSTANT. There were similar reports already at nodejs/node-v0.x-archive#4863
I couldnt fully debug it yet (especially the HTTP parsing part) but it seems it fails after it downloaded the response body. Maybe it got confused with the headers how it should determine the end of the stream and/or there is an off-by-one issue. All speculation so far though.
Following is the response header, particularly the empty content encoding stands out
HTTP/1.1 200 OK
Date: Sun, 24 Sep 2017 15:17:58 GMT
Server: Apache/2.2.22 (Fedora)
X-Powered-By: PHP/5.3.20
Content-Encoding:
Cache-control: max-age=31536000
Expires: Mon, 24 Sep 2018 15:17:58 GMT
Last-Modified: Wed, 06 May 2009 03:30:37 GMT
ETag: "33111"
Content-disposition: inline; filename="Longinus Spear.w3x"
Content-transfer-encoding: binary
Content-Length: 33358
Connection: close
Content-Type: unknown/unknown
Ok I think what is happening here is that the Content-Length
is incorrect. It seems to be 31 bytes or so fewer than the actual content, so node is probably trying to parse that last 31 bytes as another HTTP response. REDbot agrees with this.
@mscdex Is the file you downloaded smaller? When I download it the content length would match.
@neroux No, it's larger. Content-Length
reports 33358, but the body is ~33389 IIRC.
Also, the Content-Encoding
header shouldn't even be sent in the response because we aren't sending an Accept-Encoding
header in the request. The server (probably more specifically the PHP script) is just generating a bad response in many ways.
@mscdex Alright, the other way round then :). Well, in my case it does match. The file is 33,358 bytes when downloaded via a browser.
But that is what the browser did. Should the body actually be larger (and it is ignored by the browser) your earlier assumption - that the remaining data is considered a second response - might very well apply.
@neroux It could very well be that the browser is just truncating anything after 33358 bytes. Browsers are typically very lenient as well.
Closing, seems this thread ran its course.
FYI same issue can be reproduced if the content-type
header in an http response provides an unknown/invalid/unsupported charset.
This may be fine behaviour. node's http client speaking the charset of the http response is probably a requirement for node's http client to properly handle the response. However a more specific error message (e.g. "server provided content in unknown charset: ____" would be more useful for troubleshooting when this comes up.
e.g. http response headers:
Content-Type: text/plain; charset=ISO-8859-2
will result in
Error: Parse Error
at Socket.socketOnData (_http_client.js:442:20)
at Socket.emit (events.js:198:13)
at addChunk (_stream_readable.js:288:12)
at readableAddChunk (_stream_readable.js:269:11)
at Socket.Readable.push (_stream_readable.js:224:10)
at TCP.onStreamRead (internal/stream_base_commons.js:94:17) bytesParsed: 0, code: 'HPE_CB_headers_complete'
while the following is fine
Content-Type: text/plain; charset=ISO-8859-1
(node v10.16.3)
@mattpr Node doesn't care about Content-Type
specifically, it only cares that valid header characters are used throughout the header. There must be something else that is different between the two responses that is causing the parse error.
@mattpr can you provide the complete text of the response that does not parse?
I am working on a web scraper in Node.js. It has worked fine for over 30000 URLs, but this one seemingly causes an issue in the HTTP stack inside Node.js.
The following code:
Causes the following error:
The weird thing is, this request works fine in the browser and also in cURL, so I am expecting something to be wrong in the Node.js HTTP stack.
It also works if you change the attachment ID to something else. All IDs preceding it have worked.
8.5.0 and 6.9.1
macOS Sierra 10.12.6 (16G29)