Closed havardox closed 1 month ago
I don't consider this to be a bug, as it is Scrapy's default behavior.
The download handler should ideally not modify the default USER_AGENT
setting because some projects might intentionally use a specific user-agent. Overriding this setting in the download handler could unintentionally affect those projects.
To fix this, you can override the default USER_AGENT
in your project settings like this:
# settings.py
USER_AGENT = None
Setting USER_AGENT
to None
allows curl_cffi to automatically set the appropriate user-agent based on the browser it is impersonating.
The user-agent is the default "Scrapy/{Scrapy version} (+https://scrapy.org)". Very easy to detect that it's a crawler
The upstream curl_cffi sets the appropriate user-agent depending on what browser it's impersonating