notracking / hosts-blocklists

Automatically updated, moderated and optimized lists for blocking ads, trackers, malware and other garbage
2.31k stars 148 forks source link

URLhaus #95

Closed HellboyPI closed 5 years ago

HellboyPI commented 6 years ago

New list suggestion -> URLhaus - relatively new list from well known abuse.ch The list is also available as DNS Response Policy Zone (RPZ) file.

The false positive rate should be low because: "To reduce the amount of false positives, URLhaus RPZ does only include domain names associated with malware URLs that are either active (malware sites that currently serve a payload) or that have been added to URLhaus in the past 48 hours. In addition to that, Alexa Top 1M are excluded from the RPZ dataset."

notracking commented 6 years ago

Hi HelloyPI,

That is an interesting new list from abuse.ch!

I will have to update the parser for this one though (because of the url formatting) and run some statistics before adding it.

notracking commented 6 years ago

One problem that's already popping up, is (and not limited to) the false call rates this list will give on public copy/paste sites, like: https://paste.ee/.

Overall sample tests, show that simply parsing / extracting the domain names from this list will result in a high false positive rate. I would like to avoid that of course.

Will be doing some more sampling to see if I can cherry pick some useful entries, while avoiding the false positives from this list.

notracking commented 6 years ago

I will write a parser for this list that will only select active only urls with binary (application) downloads. This requires a bit more work, since every single url needs to be tested for validity / liveness.

I want to exclude hosts that already cleaned the served malware (since these will cause a huge amount of false call otherwise).

notracking commented 6 years ago

Made a script for automatic updates, it will only use hostnames that are actually serving a file that is still alive and has content-type "application/*", this reduces the amount of false calls to almost 0. It's now validating ~10.000 entries / hour.

I need to implement a sort of database to keep track of the results so I can gracefully add / remove new / old hosts to the final blocklist. If a host stops serving a malicious file it should be remove after 7 days of not serving it anymore.

The script will be released publicly when ready. It is set up in a way that it's easy to add new similar url based blocklists (I'm open for tips on possible alternative URL blocklists).

To be continued...

Mausy5043 commented 6 years ago

A big problem of the URLhaus list is that it contains lots of raw IPs. Those don't get blocked by DNSmasq because DNSmasq doesn't need to get called to resolve an IP. Clients directly communicate with such a host without intervention. Only a firewall will catch those.

Also a local hosts file will not block IPs or URLs.

cbuijs commented 6 years ago

@Mausy5043

I agree halfly :-).

If you want to block the IP address in an answer DNSMasq fetched using DNS, you can add it using bogus-nxdomain or ignore-address. So all randomly generated hostnames that are maybe attached to it (the IP), get blocked in the DNS process.

That is (as you pointed out), DNS is used to resolve to those addresses rather then direct (indeed firewall). I would do both, both DNS and Firewall.

On my network I see more DNS requests for bad stuff then direct connections to be honest (as a result of utilizing DNS as well, and "clients" not even trying because of no or blackholed IP after DNS resolution).

Ymmv.

notracking commented 6 years ago

That's partly the reason why i ever started working on nBlock.

A dns based filter does not fit all blocking needs, but it's a decent first line of defense, easy to setup and cheap on resources. And all of that without having to deal with client platforms, configurations or updates.

For the hosts-blocklists I try to collect as many hosts as possible that are worth blocking on dns level. This can be from any type of (reliable) source, like Adblock filters, network logs or even malware urls like this case. As long as I can figure out a system that allows to select those hosts that you simply do not want to communicate with in any way. Even if this means that 90% of the source is not usable at all for a dns block (Adblock lists..). The few hosts that are worth blocking can be very valuable, unique additions to a dns based blocklist.

HellboyPI commented 6 years ago

... It is set up in a way that it's easy to add new similar url based blocklists (I'm open for tips on possible alternative URL blocklists).

Other possible blocklist sources (needs researching). Some are url based, some are not. 1) Bambenek Consulting OSINT Feeds 2) Netlab - DGA feed 3) dns-bh.sagadc.org 4) Malc0de - domain blacklist 5) tracker.h3x.eu 6) URLVir 7) Malware Patrol 8) mitchellkrogza - The-Big-List-of-Hacked-Malware-Web-Sites 9) mitchellkrogza - Phishing Database 10) hoshsadiq - adblock-nocoin-list 11) Dyn malware feeds 12) NormShield - Phishing Domain Feed

notracking commented 6 years ago

Thanks for the finds @HellboyPI !

I'm tracking my status down here, will update as I go along.

0) URLHaus: im not satisfied with the current workings of my script, should be way faster, currently rethinking how to approach. 1) ADDED high confidence list (almost perfect!) 2) ADDED 3 active botnet lists that are not rotated with daily intervals 3) dupe 4) ADDED! 5) Not suited too many false calls, readme 6) ADDED! 7) Commercial 8) ADDED! 9) Too many false calls. 10) ADDED! 11) ADDED ponmocup list 12) Requires registration

HellboyPI commented 6 years ago
3. Not actively maintained

It turns out, dns-bh.sagadc.org is a mirror of malwaredomains.com as stated here. You have these 2 list (1. , 2.) , which are also mirrors of malwaredomains.com. So, you already have all the domains. The files, which you are using have current dates, so the lists should be actively maintained.

11. Not actively maintained

Aren't the "ponmocup-infected-domains-CIF-latest.txt" and "ponmocup-infected-domains-shadowserver.csv" lists up-to-date? The dates are current.

notracking commented 6 years ago

You are correct, I've added the ponmocup list!

ve6rah commented 5 years ago

If you want to block the IP address in an answer DNSMasq fetched using DNS, you can add it using bogus-nxdomain or ignore-address. So all randomly generated hostnames that are maybe attached to it (the IP), get blocked in the DNS process.

@notracking Any thought to adding such a list to the 2 existing? I fully understand that it won't catch something listing an IP directly as DNS isn't involved, but might be nice to counter randomly generated hostnames?

Probably a only a minor gain on the existing lists, but maybe worth adding?

cbuijs commented 5 years ago

@ve6rah Just my 2 cents:

There are some limitations to this. For instance, IPv6 addresses are not supported, and you cannot use it to provide whole networks/subnets (by using firehol lists as nice source for example).

The alias function could be helpfull, by "converting" every bad IP into 0.0.0.0. But needs to be done for every single IP. I tried to work this out, but having a config with millions of lines bogs things down :-).

We need feature-requests to Simon Kelley for this to make this nice and usable. In short:

So it use is kind of limited, but better then nothing I fully agree on that.

ve6rah commented 5 years ago

I don't expect it to be perfect, but the reality is that in some contexts DNS is the only thing you really have to work with. For example, I use DNS to protect several non-rooted android devices, these devices could be on any network anywhere, and android doesn't support that type of firewalling without root access (you can only allow or disallow an entire application), so I can't easily firewall them. I've also found that VPN to a controlled network isn't always practical on these devices either for various reasons.

It may be that it's low enough occurence or high enough effort vs the existing DNS lists, that it may not be worth doing, but it's just an idea for extra functionality if it's not too onerous and might improve the filtering a bit.

cbuijs commented 5 years ago

@ve6rah I fully agree. It depends on the use-case. Mine is way more restrictive as I am a DNS freak :-)