Closed fulopkovacs closed 4 weeks ago
Hey @fulopkovacs thanks for the detailed report. You're correct that headers should be handled in a case insensitive manner. Will try to replicate this and push a fix :+1:
Woah, that was a super fast response! 🙌
hey @fulopkovacs I've released a fix in v0.6.5 which is available on NPM and JSR. Let me know if this bug still pops up somehow.
Tested locally, works like a charm! Thanks for the quick fix! ☺️
The issue
Parsing a successful
ScrapeResult
fails if the response's content type is defined in theContent-Type
header field instead ofcontent-type
.The field names in HTTP/1.1 response headers are supposed to be case-insensitive:
This is the source of the issue:
https://github.com/scrapfly/typescript-scrapfly/blob/a09d6b90266a4e75046f25f6f5e0360b285f3dd1/src/result.ts#L290
Error message
This is the error message I get (with
Node
, notDeno
):Steps to reproduce
The contents of the api-response-bad.json file (a shortened version of the response I obtained by scraping https://www.headleymedia.com/resources/your-guide-to-email-lead-nurturing with Scrapfly)_
```json { "context": { "asp": null, "bandwidth_consumed": 0, "bandwidth_images_consumed": 0, "cache": { "entry": null, "state": "MISS" }, "cookies": [], "cost": {}, "created_at": "2024-08-21 14:39:26.491852", "debug": null, "env": "LIVE", "fingerprint": "4499f87b6b0cbcc70e364752b39fefb8", "headers": {}, "is_xml_http_request": false, "job": null, "lang": ["en"], "os": {}, "project": "default", "proxy": {}, "redirects": [], "retry": 0, "schedule": null, "session": null, "spider": null, "throttler": null, "uri": { "base_url": "https://www.headleymedia.com", "fragment": null, "host": "www.headleymedia.com", "params": null, "port": 443, "query": null, "root_domain": "headleymedia.com", "scheme": "https" }, "url": "https://www.headleymedia.com/resources/your-guide-to-email-lead-nurturing", "webhook": null }, "result": { "browser_data": { "javascript_evaluation_result": null, "js_scenario": null, "local_storage_data": {}, "session_storage_data": {}, "websockets": [], "xhr_call": [] }, "content": "hello world
", "content_encoding": "utf-8", "content_format": "raw", "content_type": "text/html; charset=utf-8", "cookies": [], "data": null, "dns": null, "duration": 9.62, "error": null, "extracted_data": null, "format": "text", "iframes": [], "log_url": "https://scrapfly.io/dashboard/monitoring/log/01J5TP1NQ0071K6WY7HS6VM7TC", "reason": "OK", "request_headers": { "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7", "accept-encoding": "gzip, deflate, br, zstd", "accept-language": "en-US,en;q=0.9", "priority": "u=0, i", "sec-ch-ua": "\"Not)A;Brand\";v=\"99\", \"Google Chrome\";v=\"127\", \"Chromium\";v=\"127\"", "sec-ch-ua-mobile": "?0", "sec-ch-ua-platform": "\"Linux\"", "sec-fetch-dest": "document", "sec-fetch-mode": "navigate", "sec-fetch-site": "none", "sec-fetch-user": "?1", "upgrade-insecure-requests": "1", "user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36" }, "response_headers": { "Cache-Control": "private", "Connection": "keep-alive", "Content-Encoding": "gzip", "Content-Type": "text/html; charset=utf-8", "Date": "Wed, 21 Aug 2024 14:39:29 GMT", "Server": "nginx/1.18.0 (Ubuntu)", "Transfer-Encoding": "chunked" }, "screenshots": { "debug": { "css_selector": null, "extension": "jpg", "format": "fullpage", "size": 556350, "url": "https://api.scrapfly.io/scrape/screenshot/01J5TP1NQ0071K6WY7HS6VM7TC/debug" } }, "size": 0, "ssl": null, "status": "DONE", "status_code": 200, "success": true, "url": "https://www.headleymedia.com/resources/your-guide-to-email-lead-nurturing" } } ```This issue is currently breaks some of our features in production
I discovered this issue when I started investigating this mysterious error that keeps popping up, making one of our services that relies on Scrapfly randomly fail from time to time. For now I'll try to patch it in our code, but we'd be very grateful if this could be fixed soon. (Happy to submit a PR too, but not sure if you accept them.)