rfc-st / humble

A humble, and 𝗳𝗮𝘀𝘁, security-oriented HTTP headers analyzer.
https://github.com/rfc-st/humble
MIT License
255 stars 18 forks source link

Avoiding HTTP 403 errors #2

Closed rfc-st closed 1 year ago

rfc-st commented 2 years ago

Currently a single header (a well-formed 'User-Agent', associated with a real browser version) is sent in 'humble' requests. In most cases this is not a problem, and the HTTP response headers are retrieved correctly.

On other occasions, the domain responds with a "403 Forbidden". This may be due to the presence of a WAF or a GDPR IP blocking but I get the impression that this error is actually caused by the request being interpreted as being made by a bot (by not including certain headers that might be necessary.)

I have tried many combinations: including the default request headers sent by curl, Chrome, etc, without success.

I need your help to identify a pattern that allows 'humble' to retrieve the HTTP response headers in these specific cases. Any ideas, advices or suggestions will be welcome and, of course, I will mention them in the README of the project!!.

Thank you.

rfc-st commented 2 years ago

I'm almost certain that these 403 errors are due to Cloudflare, specifically one of its features that categorizes these requests as automated, associated with bots, etc.

Investigating how to make the requests in a 'polite' way so I can retrieve the headers from domains using Cloudflare ...

rfc-st commented 1 year ago

After some time reviewing how to avoid this problem I have finally decided to remove the specific code associated with HTTP 403 errors (27293ff). After some testing, the domains that previously returned this error now seem to return the response headers correctly.... if I detect additional problems I will revisit this issue.