Closed DarkZeroCD closed 7 years ago
That is going to be difficult to track down. They have 2 domains http://whois.domaintools.com/leaksearch.net and http://whois.domaintools.com/leakid.com
The second link you sent with the article suggests they make their IP's look like fake Google crawlers so detecting them is going be tricky.
You would have to take some logs of yours and look through them very carefully to find anything calling itself googlebot or bingbot and see if it's IP address matches with a genuine google or bing bot.
If you are running wordpress you should add the wordfence plugin and set it to block all fake google crawlers which will help you to some extent.
Unfortunately with companies like this they hide their tactics very well, they probably do use a user-agent like Google or Mozilla that can get past anything and they probably have several blocks of different IP addresses so they probably change IP every hour on their bot making it hard to track.
I've spent hours with logs looking for certain baddies it's very time consuming but if you use something like TextWrangler you can quickly sort, delete the lines you know are not what you are looking for and slow bu surely narrow it down to a shorter list that you can analyze.
Thank you for the info.
As I said, I do not want to be referenced. Maybe I can black listed all user-agent like google, bing and other search engines? Only users who know the URL of my site can access it.
I do not use WordPress, so I do not have access to the many plugins of defense.
The most effective solution would be to block IP ranges, and even if they often changes as you say, it will be difficult ..
Have you analyzed the Apache logs and sorted the IPs? Impressive.
I don't have time to test this today but what if you block google and bing and all other search engines using the https://github.com/mitchellkrogza/apache-ultimate-bad-bot-blocker/blob/master/custom.d/blacklist-user-agents.conf file.
You can try it, and see if it overrides the earlier whitelisting and let me know, otherwise I will test it tomorrow/monday and see.
So in that custom include file add them like this
BrowserMatchNoCase "^Google" bad_bot
BrowserMatchNoCase "^Bing" bad_bot
BrowserMatchNoCase "^Yandex" bad_bot
BrowserMatchNoCase "^Yahoo" bad_bot
etc etc and reload Apache and test with curl's to see what happens
I tested quickly and it DOES work .... adding any of the previously whitelisted bots to the custom include file blacklist-user-agents.conf allows you to override what the main globalblacklist.conf does. I did this following mod to my blacklist-user-agents.conf file and it blocks all of them. So ... YEAH .... this custom include system allows users total control over what they want the blocker to do.
Here's how my blacklist-user-agents.conf include file looks. (remember to reload apache after changing the file)
# Add One Entry Per Line - List all the extra bad User-Agents you want to permanently block
# This is for User-Agents that are not included in the main list of the bot blocker
# This allows you finer control of keeping certain bots blocked and automatic updates will
# Never be able to remove this custom list of yours
BrowserMatchNoCase "^MyVeryBadUserAgent" bad_bot
BrowserMatchNoCase "^adidxbot" bad_bot
BrowserMatchNoCase "^AdsBot-Google" bad_bot
BrowserMatchNoCase "^aolbuild" bad_bot
BrowserMatchNoCase "^bingbot" bad_bot
BrowserMatchNoCase "^bingpreview" bad_bot
BrowserMatchNoCase "^DoCoMo" bad_bot
BrowserMatchNoCase "^duckduckgo" bad_bot
BrowserMatchNoCase "^facebookexternalhit" bad_bot
BrowserMatchNoCase "^Feedfetcher-Google" bad_bot
BrowserMatchNoCase "^Googlebot" bad_bot
BrowserMatchNoCase "^Googlebot-Image" bad_bot
BrowserMatchNoCase "^Googlebot-Mobile" bad_bot
BrowserMatchNoCase "^Googlebot-News" bad_bot
BrowserMatchNoCase "^Googlebot/Test" bad_bot
BrowserMatchNoCase "^Googlebot-Video" bad_bot
BrowserMatchNoCase "^Google-HTTP-Java-Client" bad_bot
BrowserMatchNoCase "^gsa-crawler" bad_bot
BrowserMatchNoCase "^Jakarta\ Commons" bad_bot
BrowserMatchNoCase "^Kraken/0.1" bad_bot
BrowserMatchNoCase "^LinkedInBot" bad_bot
BrowserMatchNoCase "^Mediapartners-Google" bad_bot
BrowserMatchNoCase "^msnbot" bad_bot
BrowserMatchNoCase "^msnbot-media" bad_bot
BrowserMatchNoCase "^SAMSUNG" bad_bot
BrowserMatchNoCase "^slurp" bad_bot
BrowserMatchNoCase "^teoma" bad_bot
BrowserMatchNoCase "^TwitterBot" bad_bot
BrowserMatchNoCase "^Wordpress" bad_bot
BrowserMatchNoCase "^yahoo" bad_bot
Funny thing, the overriding works on the apache blocker but not on the nginx blocker. Will have to look into that next week as all these updates I have been doing are intended to give people total control. But for your case you are using this blocker and the over-rides work 100%
Hello,
My site received a complaint about copyright. The site is not listed and is not very popular. Yet this has been detected. The mail indicated use leakid.
http://www.leakid.com https://korben.info/leakid-la-solution-anti-direct-download.html (french)
If I want to be able to detect the IP ranges used to scan the site, how do I do it? To start somewhere.
Thank you for your help.