jxlil / scrapy-impersonate

Scrapy download handler that can impersonate browser' TLS signatures or JA3 fingerprints.
MIT License
78 stars 9 forks source link

User-agent not set #12

Closed havardox closed 1 month ago

havardox commented 1 month ago

The user-agent is the default "Scrapy/{Scrapy version} (+https://scrapy.org)". Very easy to detect that it's a crawler

The upstream curl_cffi sets the appropriate user-agent depending on what browser it's impersonating

jxlil commented 1 month ago

I don't consider this to be a bug, as it is Scrapy's default behavior.

The download handler should ideally not modify the default USER_AGENT setting because some projects might intentionally use a specific user-agent. Overriding this setting in the download handler could unintentionally affect those projects.

To fix this, you can override the default USER_AGENT in your project settings like this:

# settings.py
USER_AGENT = None

Setting USER_AGENT to None allows curl_cffi to automatically set the appropriate user-agent based on the browser it is impersonating.