Open nikolaydubina opened 1 week ago
While filtering based on the User-Agent in request headers is possible, it can be easily bypassed. In your case, how about using a service like Google Analytics alongside Hits?
I am using hits to avoid Google Analytics :P
reasons
what if we assume that crawlers are well-behaving and do not want to bypass protections?
e.g. robots.txt
are respected by major crawlers
anything like this is possible? any well-known protocols/standards to talk-to/detect crawlers? (one way I can imagine is to have reverse-proxy at hits.sh end with robots.txt that disallows going further. and then somehow making another HTTP request to your backend, no without robots who dropped at robots.txt
block).
That sounds interesting. But if I do that, the Hits
site might not be indexed by search engine anymore.. 😥 In this case, I can consider adding a condition to '*.svg' only.
But some users might want to use Hits to track download counts (for instance, increasing the hit count when a file is downloaded from a page) and I think I can't confirm that this is bot or not 😂
Hmm..
Adding a parameter to .svg request like https://hits.sh/github.com/silentsoft.svg?blockBots
might possible solution !
something like this probably will work
instead of 304 can also use meta html tag: https://www.w3.org/TR/WCAG20-TECHS/H76.html
some other method people say
Is there way (or ideas?) on how to detect real humans vs bots that crawl webpages?