Open srstsavage opened 1 month ago
I'm sure I missed a few but looks like the list isn't too long
aiohttp
Apache-HttpClient
^curl
Go-http-client
http_get
httpx
libwww-perl
node-fetch
okhttp
python-requests
Python-urllib
[wW]get
Completely see your point. I like the idea of having optional tags:
"tags": ["generic-client"]
Would you do a pull-request? Thanks!
I was surprised to find http clients like
python-requests
,Go-http-client
,wget
,curl
, etc included in the crawler list. While I understand that these tools can be abused, in our case a large portion of our legitimate web traffic is from API requests using http clients like these.For now I think I'll need to create an overriding allow list of patterns and remove matches from
agents.Crawlers
before processing, but it would be great to be able to disambiguate client tools/libraries based on a field incrawler-user-agents.json
. Maybe just anis_client
boolean, or a more generictags
string array which could containclient
or similar? Any thoughts?