Closed sr093906 closed 1 year ago
Skimming the list, I found some trackers, crappy hosts, ad domains, popup porn ads, subdomains of main domains blocked for good reason and co. Analyzing this list would take days to weeks. At the moment I don't have the time ...
Interesting dataset - might be handy for TLD discovery but other than that it's almost impossible to analyze/test
Given scant interest and negative feedback, I will look through the list myself to find out 'a few' FPs based on my mood. Stay Turned.
Thanks for the help.
Please re-open it. I haven't finished it.
I even haven't finished those beginning with 'a'.
Ok, sorry.
You can also post the domains in this issue, no need to open a new one for each. But, as you like ...
Some numbers for those interested:
| List | Domains in top 1M |
|---|---|
| oisd small | 597 |
| oisd big | 2481 |
| HaGeZi multi light | 2200 |
| HaGeZi multi normal | 3034 |
| HaGeZi multi pro | 3542 |
| HaGeZi multi pro++ | 4871 |
| HaGeZi multi ultimate | 6511 |
Of course being part of the top 1 million most visited websites doesn't mean that it's a legit domain, so be careful with jumping to conclusions.
I applaud your efforts, by the way. Maybe a small subset of this list can be used to check whether a list is fit for inclusion, same as what oisd does. For example, some legit looking webshops are loaded from the NoTracking list, which in turn got them from here, which with all due respect looks like a pretty obscure and not frequently updated list. That raises the question whether NoTracking has a strict enough inclusion policy (and in return HaGeZi as well).
I applaud your efforts, by the way. Maybe a small subset of this list can be used to check whether a list is fit for inclusion, same as what oisd does. For example, some legit looking webshops are loaded from the NoTracking list, which in turn got them from here, which with all due respect looks like a pretty obscure and not frequently updated list. That raises the question whether NoTracking has a strict enough inclusion policy (and in return HaGeZi as well).
Thanks for the advice, I'll see how I can get a handle on this.
@Notracking: How do you see that? I would think about removing the source mentioned.
@sr093906 First of all, thanks for all your effort. Before you start the next wave, please wait, I'm just straightening a few things. When the build of the new lists is online I'll let you know. Then please continue testing against the new Ultimate. It will be ready in a few hours. Thanks ...
I hope the adjustments will save me much time.
@sr093906 "cleaned" Ultimate is online. Should find less now ...
@hagezi Thanks for notification. I will continue the check later.
https://github.com/MISP/misp-warninglists/
Whitelist resources. Maybe some lists will be helpful.
@sr093906 I've done more cleanup, the build is running now and will be through in a few hours. I'll let you know ...
@sr093906 Update is live, cleaned pro to ultimate.
FYI:
Toplists: https://github.com/hagezi/dns-data-collection/tree/main/top
toplist.txt - Umbrella
toplist.tranco.txt - Tranco
toplist.chrome.txt - Chrome
Thanks for letting me know. I will check.
@sr093906 STOP posting potential phishing domains to whitelist, check the phishing sources and report them there. If they are removed from the phishing lists, they disappear from my lists too!
Thanks, Gerd
@sr093906 Please spare me with these Chrome-Toplist Crap sites from the lower ranks, I use for my TIF the Umbrella/Tranco Toplist as Whitelist. So the hosts you reported are not on either toplist if they are blocked by my TIF. Report them upstream if you think they are false positives.
Thanks, Gerd
I have now spent hours on these issues. I cleaned up the lists using the Chrome Toplist. Everything that was safe to remove was removed.
Done.
I applaud your efforts, by the way. Maybe a small subset of this list can be used to check whether a list is fit for inclusion, same as what oisd does. For example, some legit looking webshops are loaded from the NoTracking list, which in turn got them from here, which with all due respect looks like a pretty obscure and not frequently updated list. That raises the question whether NoTracking has a strict enough inclusion policy (and in return HaGeZi as well).
Thanks for the advice, I'll see how I can get a handle on this.
@notracking: How do you see that? I would think about removing the source mentioned.
Well, Stonecrushers list is basically a scraped version of: https://www.watchlist-internet.at/unserioese-webseiten/ https://www.watchlist-internet.at/about-us/ https://www.oiat.at/
Though I will remove/disable it because it should have (at least) excluded their "Problematische Online-Shops" list, which mostly has shops with bad service (based on user reports).
Thanks!
https://github.com/zakird/crux-top-lists
So, please treat them as domains visited by real human. And based on such an assumption, not a few can/should be treated as FPs.
The list is generated by downloading the repo's latest csv file and stripping http:// and https://
After that, entries seen in Fake, Threat Intelligence Feeds, DoH/VPN/TOR/Proxy Bypass (complete edition), Safesearch not supported, Dynamic DNS, Badware Hoster and Personal are removed.
Finally, common entries between the processed file and the raw domain version of ultimate blacklist are listed.
There are some bet and porn sites there, of course. For others, some are clearly FPs like those staring with blog., login, and others.
5.txt