serverless-dns / blocklists

An opinionated collection of blocklists for RethinkDNS.
https://rethinkdns.com/configure
Mozilla Public License 2.0
83 stars 26 forks source link

Add ShadowWhisperer Malware & Typo Squatting Lists #68

Closed SpencerIsGiddy closed 1 year ago

SpencerIsGiddy commented 1 year ago

I feel like the typo squatting list is a must and why not have a bit better malware protection on the side. Malware list has 31,358 domains and typo squatting has 73,232. These lists are not included in any other lists on rethinkdns from my knowledge and these lists do not include any other lists in them. There are plenty of other lists by ShadowWhisperer that may be of interest at https://github.com/ShadowWhisperer/BlockLists/tree/master/Lists . These were just the ones I thought would be suited best

ignoramous commented 1 year ago

I removed a fake-domains and a domain-squat list curated by RPi as it was too extensive (1.3M entries).

Just like with NRD (newly registered domains) lists (routinely have distinct and unique 2M+ entries), it seems like this one has the capacity to keep growing forever.

For lists as large as NRDs (in the current lists, there are ~8M non-duplicate domains from ~13M domains overall across 194 blocklists), I think we have to find a different way to package them. Adding these may cause the compiled blocklist (in a Radix Trie) size to grow beyond 100MB (it is at 85MB, today)... which is okay, but then, the code won't run on Clouflare Workers which only has 128MB RAM available.

So, if Cloudflare expands RAM size for the lowest pricing tier for Workers to 256MB (like Fly) or 512MB (like Deno), then these huge non-overlapping lists can go in as-is (in the Radix Trie).

Another avaneue is to load these NRD-like lists up in Cloudflare D1 (a database) just for Workers, but D1 is likely 50x slower than querying these 85MB compiled lists in-memory, like we are today.

Typically, I'd want to serialise these NRD-like lists separately in a FST (finite state automata) instead of in the current Radix Trie, but there isn't enough time to implement a FST just for this.

SpencerIsGiddy commented 1 year ago

Ok I get what you’re saying for the most part. Hopefully cloudflare can give access to more ram for the lower price tiers in the near future

ignoramous commented 1 year ago

I'll want to explore the possibility of merging such large unbounded lists some day.

Tracking it here: https://github.com/serverless-dns/blocklists/issues/92

ignoramous commented 1 year ago

Added: https://github.com/serverless-dns/blocklists/commit/5b4064f5d3e77ea9e4310869f81a732ab2823695