paulgb / BarbBlock

Chrome extension which blocks requests to sites which have used legal threats to remove themselves from other blacklists.
https://ssl.bblck.me
MIT License
639 stars 22 forks source link

Add other domains owned by admiral? #4

Open KeenRivals opened 7 years ago

KeenRivals commented 7 years ago

Many other domains were found that are owned by Admiral and point to the same IP as #1. There's a list at https://pgl.yoyo.org/adservers/admiral-domains.txt

paulgb commented 7 years ago

Thanks for this! I'll have a look when I'm at a computer

On Aug 12, 2017 11:55 AM, "KeenRivals" notifications@github.com wrote:

Many other domains were found that are owned by Admiral and point to the same IP as #1 https://github.com/paulgb/BarbBlock/issues/1. There's a list at https://pgl.yoyo.org/adservers/admiral-domains.txt

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/paulgb/BarbBlock/issues/4, or mute the thread https://github.com/notifications/unsubscribe-auth/AAC0XRsMHcj3UPhAvHkJToJddeujsbYlks5sXcr6gaJpZM4O1cvW .

tofof commented 7 years ago

That list is woefully incomplete. See my comment here for some analysis. Click the reverse-domain links I provide (e.g. ipv4info for functionalclam.com) and you can see how deep this rabbit hole goes, and that's just for a single ip.

Sample admiral domains not in that list: btez8.xyz innocentwax.com completecabbage.com 4jnzhl0d0.com h78xb.pw

It's trivial to observe hundreds of Admiral domains, they probably number in the thousands.

mvasilkov commented 7 years ago

Let's kill the whole Admiral thing with fire!

paulgb commented 7 years ago

I've merged #8 which adds more Admiral domains, if there are still some missing I'm happy to add them as they are discovered

mirague commented 7 years ago

I support adding all domains pointing to the same content - it's likely all these domains would eventually have found their way into EasyList in the first place. Might be less likely now.

JamyDev commented 7 years ago

Maybe block that IP too in anticipation of more domains being added?

tofof commented 7 years ago

8 is still not even close.

It has...

It's missing...

... I think that demonstrates my point. (Yes, there are many duplicates once you start putting all of these together).

If someone actually wants to make a serious attempt, which hasn't happened yet, just walk the related domains on a tool (like threatcrowd) that lets you do so easily.

It's trivial to verify these even when you're learning the naming patterns, since they all serve up the same image. But you really have to do the verifications. Bannersnack.com, for example, is NOT an Admiral domain, even though it was hosted alongside them once.

Some starting points that I haven't already exhausted above include... tzwaw.pw 0D7DK.XYZ pz37t.xyz 3jsbf5.xyz - beware, there's at least one domain (apstylebook.com) that'd be a false positive. 4jnzhl0d0.com 82o9v830.com familiarfloor.com

The biggest problem is that they use google/amazon hosting and so you can't trivially blacklist everything that resolves into their IP space, and that tools like ipv4info, threatcrowd, alienvault, tcpiputils, all have incomplete datasets. You really need multiple people using different toolsets walking the same space to root all of these out.

lol768 commented 7 years ago

@tofof Given the ones that I have seen seem to use valid SSL certs from Let's Encrypt, do you think crawling the CT logs is a viable way of checking for these?

anon182739 commented 7 years ago

What about doing it the other way around? As you said, it's trivial to verify a domain. What about checking all suspicious domains? If one is found, it's sent to some central server to be added to the list. Another way would be to just blacklist the IPs. This would have some false positives, but if it becomes a default filter for ublock/adblock it would have the effect of forcing them to clean up their IP ranges. But really, how often do you have a legitimate reason for loading random .js from a .xyz domain?

@lol768 They issue hundreds of thousands of certificates each day, but if you manage to get the first filtering down it's feasible, then you can use DNS to filter out more.

@mvasilkov I don't think that's feasible. Maybe by sending abuse notices to the registrars, since they care more about quantity than quality.

anon182739 commented 7 years ago

Also, playing cat-and-mouse with domains is not really a problem. It costs at least $1 to register one, and much less to find and block it.

paulgb commented 7 years ago

Actually, @lol786's suggestion sounds like a great idea for keeping track of new registrations from this company. It could be automated as a nightly script.

paulgb commented 7 years ago

That's just a way of narrowing down the search space; we'd still verify that the domain behaved like the others.

I'd like to keep this to a pure domain blacklist, as opposed to running code on the client, for a few reasons: 1. it's more portable to existing blocking extensions, 2. it's more performant, and 3. I'm more comfortable with the legal defensibility (I am confident that a passive blacklist can never be illegal; I don't want to speculate about the legality of a more active client-side approach)

anon182739 commented 7 years ago

@zymase You can just check the image dimensions, or the length of the body. This works for me: def isAdmiralDomain(domain): try: response = requests.get('https://' + domain) except: return False if len(response.text) == 179: return True else: return False

Otherwise, how on earth can a list be conceived otherwise than with try-and-succeed/try-and-fail policy? It's just impossible. We are facing networks as dirty as bots.

Start of with the LE cert list. Filter away anything that doesn't end with .com, .xyz, or .pw. Issue DNS queries for everything. If the whois isn't protected, remove it. If they're not using whatever registrars they're using, remove it. Then use tor/proxies (it's kind of counter productive for them to block IPs) for the final verification.

Why not literally treat them as bots? I don't have any, but SMTP accounts are allegedly quite cheap. What if you'd just send bulk abuse notices to the registrars and accuse them of being C2 servers for some botnet? They generally don't care if the notices are valid, they only care about the quantity of notices they're getting.

@paulgb

That's just a way of narrowing down the search space; we'd still verify that the domain behaved like the others. And this job can be automated easily, as long as the domain can be identified once you can put it in the list forever.

anon182739 commented 7 years ago

GitHub stripped away the formatting, so you need to add some newlines and indentation. The images are identical right now. CRC32: 8db019c1 MD5: 681e062bb33b9ba28f3427e7283c81a8 SHA1: 3fcf7e14e92043a00926d340d45778b618bc87a9 SHA2: 32afacb9285649aa4af43ea03e7cd9a522aa3e6d0554a2dabe308fac4531be5f

anon182739 commented 7 years ago

Another identifying mark is the robots.txt, which is unlikely to change: User-agent: * Disallow: /

The 404 page seems nonstandard: 404 page not found (it's not HTML) Content-Type:"text/plain; charset=utf-8"

The registrar is enom for the .com domains and namecheap for the .xyz and .pw domains.

They all use the same 4 nameservers. You should be able to enumerate from that, it's an uncommon combination. NS-1212.AWSDNS-23.ORG 205.251.196.188 NS-1627.AWSDNS-11.CO.UK 205.251.198.91 NS-305.AWSDNS-38.COM 205.251.193.49 NS-697.AWSDNS-23.NET 205.251.194.185

anon182739 commented 7 years ago

@zymase You want to block the domain names. You can get a new IP for almost nothing, getting a new domain name costs some money. The picture's name changes, but the site's structure is the same.

anon182739 commented 7 years ago

Also worth noting is that you can query however many admiral domains you want with Tor. If they start blocking IPs, then their ads won't work anymore.

anon182739 commented 7 years ago

@zymase I don't understand what you're trying to say. You can look at a domain and see if it has that picture. If it does, it also has the script we want to block.

The domains don't resolve to the picture. The domains resolve to Google IPs, which then serve that picture. The domains all point to different IPs.

The purpose is to block the script they use, the easiest way to do this is to block the Admiral domains so they can't serve the script.

tofof commented 7 years ago

@anon182739 wrote:

Filter away anything that doesn't end with .com, .xyz, or .pw.

I wouldn't use such a filter. First of all, Admiral has at least one .us domain that I've already prominently mentioned in this thread; it's literally the first domain I name as 'missing'. Second, there's no reason to think they won't expand to other TLDs. Nine months ago, noone had spotted any .xyz Admiral domains - I believe they were only using .com and .pw at that time.

tofof commented 7 years ago

@zymase You've stated that you don't understand and are not a coder. It's acceptable to be interested, to use the emoji-response features, etc, but please don't clutter a single-issue thread with philosophical meanderings, well-wishings, tortuous analogies, and otherwise "laying the obvious," whatever that means.

To address your final point: No, there is no reason to think that there must be a common resource, accessible to the public, that would identify all such domains. If instead you mean that there must be a reason for these domains? Yes, the reason is that Admiral owns them and happens, for now, to serve the same content from all of them. They could just as easily serve nothing but a 403 or 204 error code, or just blackhole connections.

The list's criteria are already stated: it is a list of domains who have misused DMCA takedowns to attempt removal from other lists. This issue is for the suggestion that affiliated domains that are owned by the same company and used for the same purpose be included alongside the singular example named on a DMCA takedown thus far. The criteria for inclusion that's being proposed, then, is similarly obvious: Admiral-owned domains that appear to serve the same (lack of) content and presumably host the scripts used in serving advertising at affiliated websites.

Quite contrary to your assertion, such a list, if built, will be built exactly the same way all other advertising-blocking lists are built: on the finds of participants reporting "I found another one". The starting points are found when an Admiral-protected website (e.g. thewindowsclub.com) uses scripts hosted on an Admiral server to display its contents.

I have already outlined the best possible way to find more Admiral domains given a starting domain: by using tools meant for that, i.e. tools that identify spatially- and temporally-related domains.

paulgb commented 7 years ago

@tofof Thanks for the analysis you've been doing, I have only been skimming this conversation while working on providing all the blacklist formats people want but now that that's done I want to take a real stab at automating some of this.

anon182739 commented 7 years ago

@zymase

We're not going to quest the whole web, domain after domain to see the ones which point to that picture, right?

You can narrow it down enough so you don't need to check the entire web. @tofof What's wrong with scraping DNS/cert lists? They can easily make sure that 1 domain = 1 ip and avoid tainting each other, it's non-trivial to make it harder to verify the domains. @paulgb I'm already working on it, I've managed to hack together a python script that does the job. Should I post it here, or is 'security by obscurity' better?

It turns out they only have 159 domains apparently, all the "different starting points" were somehow interlinked.

https://pastebin.com/6mPnXBiR

paulgb commented 7 years ago

Great stuff, is this from the CT log or just from grabbing the IPs already found?

Let's keep the script apart from this, but if you're not a paying GitHub user I can create a private repo and add you to it so we can collaborate.

anon182739 commented 7 years ago

This naming scheme is interesting. If you visit any admiral domain (example: http://abandonedclover.com http://abruptroad.com) you get that image: http://abandonedclover.com/6f044848f5e9030b6fd409a7e153defd6d8c4e58fb082a44da549ed3e421f9755aedb08132895be1e0d578e7 But each time you refresh the page, you get a new URL: http://abandonedclover.com/f1e5b5d86bcceb851312e5cc5f7bce26bb10ab951c152cb63a5068954caaa20d196ab3a042d306f561b71c22 You can use it multiple times, and across different domains. I really wonder how this works? Is it a signature of some sort? @paulgb This is from scraping one domain (hfc195b.com) and recursively querying the results from threatcrowd. It seems to cover all of the "starting points" listed though, except for 0d7dk.xyz pz37t.xyz 3jsbf5.xyz that weren't reachable. So this should be all of their domains. Sure, or I can send it in a PM if you want. It's not of much use now though.

paulgb commented 7 years ago

Sure, a PM works for me. My email is paulgb@gmail.com

Cheers.

anon182739 commented 7 years ago

Oh, you can't send GitHub PMs anymore apparently. Gmail filters any anonymous e-mail addresses since they're used for spam. If you already have github premium, it's probably easier to do it that way.

paulgb commented 7 years ago

Ok, I created a repo.

anon182739 commented 7 years ago

I can't see anything. Where do I get the notice?

paulgb commented 7 years ago

Try this link: https://github.com/paulgb/iptool/invitations

anon182739 commented 7 years ago

Thanks, it works now.

anon182739 commented 7 years ago

There might be a point to running it without Tor, it's a bit finicky and seems to give some false negatives. This could be intentional, but I don't think so since I could get them with Tor on a second try. Some IPs gave really many domains, one or two were probably responsible for 70% of the domains.

anon182739 commented 7 years ago

Interesting, shodan gives me IPs that don't have any domain names registered according to threatcrowd https://pastebin.com/w3HZ4Ker query: "X-Datacenter: Content-Length: 179 Date: Mon 2017" Replace Mon by any other week day, it's just to split it up so it doesn't get past 2 pages which is the free account limit, you can also add month if you don't want to create account

tofof commented 7 years ago

@anon182739

It turns out they only have 159 domains apparently, all the "different starting points" were somehow interlinked. https://pastebin.com/6mPnXBiR

FFS. No, that list is incomplete.

It's missing tons of the ones I already pointed out. It includes false positives, too, like its literal first lines, which are google and not Admiral domains. They currently resolve to Admiral content, but they won't always. And it has double entries.

It is barely any different than the list already attached to this repository.

It's missing, at minimum:

And that's just the ones I explicitly named as examples already of what was missing before.

I am very tired of people posting short lists and saying "work's done, this is all of them!" I especially can't wait until these singular examples get added without adding any of the others from all the links I already provided.

tofof commented 7 years ago

@anon182739

What's wrong with scraping DNS/cert lists? They can easily make sure that 1 domain = 1 ip and avoid tainting each other, it's non-trivial to make it harder to verify the domains.

1) I never said a single thing against scraping cert lists, please take your fallacious statements elsewhere. 2) It is in fact trivial to make it harder to verify these domains, and I have already provided an example of two ways they could to do it in this very thread. Again, please peddle falsehoods elsewhere.

The closest I can imagine that gave you such an idea is a very poor reading of my statement against a filter that would already miss known Admiral domains; i.e. a filter that excludes .us domains.

anon182739 commented 7 years ago

@tofof Rude.
owlsr.us
Honest mistake, it probably timed out because it IS in there, it was just marked as "not admiral" because the request timed out or something similar. Tor is somewhat finicky, but if you have a cleaner IP you can re-run the script (ask @paulgb to add you)

al102.xyz, 3jsbf5.xyz
dead

roastedvoice.com

This Domain Name Has Expired - Renewal Instructions.

17ars.xyz
dead

lewdwind.com

This Domain Name Has Expired - Renewal Instructions.

b6227.xyz
dead

granodiorite.com

This Domain Name Has Expired - Renewal Instructions.

0d7dk.xyz
dead

pz37t.xyz
dead

3jsbf5.xyz
dead

Worth noting is that a fair amount of the ones marked as dead were checked, but marked as not-admiral because they were, well, dead.

It includes false positives, too, like its literal first lines, which are google and not Admiral domains.

They are admiral domains. They serve the Admiral malware and act as C2 server for the botnet. How are they not admiral domains?

And it has double entries.

Caused by differences in case, it can be resolved by adding a line in the script.

anon182739 commented 7 years ago

Checking who.is history (make sure to disable JS otherwise it fetches it again), you can see that the dead domains have expired. Except 17ars.xyz apparently, unless someone else registered it.

Isn't it a security risk to be loading random JS from domains that they're not taking very good care of? They're not attaching a hash from what I can see.

anon182739 commented 7 years ago

Except 17ars.xyz apparently

Name
REACTIVATION PERIOD
Organization
NAMECHEAP
Address
11400 W. OLYMPIC BLVD, SUITE 200

Well, that explains that.

tofof commented 7 years ago

@anon182739 My apologies. First, thank you for your work. Please forgive my outburst; literally the very first example I provided (one I know is live and is Admiral) was omitted. That, with the immediate appearance of so many more omissions, caused me to overstate things.

Given that you've verified that your tool has timed out on at least one positively-Admiral domain, can I ask what assurance you have that others that I haven't manually called out didn't get similarly classified?

I didn't realize many of those were dead; note that the Adguard DNS Filter contains many entries for Admiral domains, including, for example, roastedvoice.com and 3jsbf5.xyz - so when I had made cursory glances at some of the domains, my browser happily notified me that it was already blocking them. For the same reason you've discovered, I didn't want to do temporary exceptions to override that. For the unaware: Admiral tracks hits to their domains and blocks you if you have too many in too short a period, presumably to defeat exactly this sort of listbuilding.

anon182739 commented 7 years ago

I never said a single thing against scraping cert lists, please take your fallacious statements elsewhere.

I'm sorry, I missed your reply on it. But what are your thoughts on it? You could dynamically add domains to the filter, so whenever a .us domain is spotted you add it to the filter. It would still be useful to narrow things down.

It is in fact trivial to make it harder to verify these domains,

Harder? Yes. Hard? No. I used the wrong word there.

and I have already provided an example of two ways they could to do it in this very thread.

I missed them. But it's hard to obfuscate it indeterministically, they will still have some kind of recognizable pattern. It's enough to find the pattern once, then that domain's useless forever and you've costed them $1.

tofof commented 7 years ago

@anon182739 I think that the idea of scraping cert lists is potentially clever. I see a few potential hurdles, however.

The ways to make things harder I'm referring to: at any point, Admiral can just blackhole incoming traffic* or serve up http status codes instead of the helpful image saying "Hi I'm Admiral". Particularly when they can put their scripts behind completely random filenames.

*incoming traffic other than to their randomly-named-at-creation script, of course

At some point they have to actually use their servers, and we can see references to the scripts "in the wild" on other pages, and catch them that way. But that's the super-slow super-manual way of doing it.

Fundamentally I don't think I agree with your assertion that there has to be a pattern that's externally recognizable. It's very beneficial to us right now that there is, but I don't think that has to be the case at all. How do you identify a server that blackholes all traffic except that seeking a 100-character random filename? You certainly don't do it from guessing the filename, nor can you positively identify it by the lack of response to other traffic.

I do, however, wholeheartedly agree that going after domains is worthwhile. It definitely costs less for us to hunt them than it does for them to buy them, at least in the current balance.

anon182739 commented 7 years ago

@tofof None, except that Tor isn't that finicky. If you have a more reliable internet connection you can run the script. Security through obscurity is probably useless, it was just an instinctive reaction, https://pastebin.com/tsBH5s7A

For the unaware: Admiral tracks hits to their domains and blocks you if you have too many in too short a period, presumably to defeat exactly this sort of listbuilding.

Are you sure? I don't think that happened, but it's possible the API timeout saved me from it. In any case, you can use Tor stream isolation for each requests. If they block Tor, that's just a net win since it means some people get "free" adblocking and proxies are plentiful anyway.

Do you have any other information/documentation like this about admiral?

tofof commented 7 years ago

Regarding those googleusercontent.com lines - Yes, they serve up Admiral content right now, but that's just because Admiral is hosting on Google Cloud platforms. I maintain that they're false positives, at least until their use in the wild can be verified.

Google+, AdWords, Google Fonts, and some Google Translate content all gets served up from googleusercontent.com servers.

tofof commented 7 years ago

@anon182739

Are you sure? [that Admiral blocks you if you hit them too often]

I'm not 100%, in that I haven't verified that behavior this week. I make that claim from my experiences trying to pin down their bootstrapper 6 months ago, when that definitely was the case, and I was getting served explicit error messages stating that I was being deliberately blocked. It's possible that you have to be hitting the script file or something other than the landing image to trigger that, though.

anon182739 commented 7 years ago

Sheer volume of certs per day is large. I checked. It was large, but not very large. 100k-200k a day, this is about 2 per second. Certainly manageable. Filters might help, but I think they're more likely to be more trouble than they're worth - if you fail to include e.g. .us and then at some future point find Admiral domains with that TLD, now you have to .. what, go back through the cert logs to some starting point and rescrape a whole set of domains you'd filtered before? Doesn't seem like filtering is a meaningful reduction in workload at that point.

You only have to go back and re-check the ones with .us domains. You can check the registration date of the first ones spotted in the wild and go back from there.

If the domain purchase and the cert application happen concurrently, before DNS has populated, does a verification script risk hitting the domain before it's "live" and miscategorizing it as non-Admiral?

You need to have a valid DNS entry for Let's Encrypt, it's all automated.

The ways to make things harder I'm referring to: at any point, Admiral can just blackhole incoming traffic*

They still have to receive the HTTP request, so they can't blackhole it entirely - they still need to send some TCP packets to be able to listen.

or serve up http status codes

And then those will be identifiable, like they are now. Or they'll all be random which will also look strange - why would an nginx server send an apache 404?

Particularly when they can put their scripts behind completely random filenames.

The filenames are valid for all domains right now.
As said before, they only need to have a detectable signature once to be blocked.

we can see references to the scripts "in the wild" on other pages, and catch them that way. But that's the super-slow super-manual way of doing it

That can be automated too. Compile a list of sites that use admiral, see what domains they link to. Admiral is free of charge to use, and you can use e.g. shodan if you ever find a pattern for the bootstrapper to find domains using it.

Admiral can just blackhole incoming traffic* or serve up http status codes instead of the helpful image saying "Hi I'm Admiral".

I'm not so sure about this. When a domain registrar receives an abuse notice for a randomly generated .xyz domain registered with whoisguard that only serves randomly named obfuscated JS files, what will they think?

I maintain that they're false positives, at least until their use in the wild can be verified.

They're only serving admiral content though. I maintain that they're valid admiral domains, since they're functioning exactly like admiral domains besides for being unused.

I was getting served explicit error messages stating that I was being deliberately blocked

Did this block you from getting the actual scripts too? How long did it last for? Would it be possible to block all tor exit nodes/similar?

anon182739 commented 7 years ago

Also, the robots.txt will still be identifiable unless they want to get revealed by internet archive.

tofof commented 7 years ago

The filenames are valid for all domains right now.

That's interesting, I missed that earlier.

Did this block you from getting the actual scripts too?

Yes, I distinctly remember that it blocked me from getting the scripts. I also remember it unsurprisingly caused some sort of failure to be able to load Admiral-protected content on affiliated sites. I don't honestly recall if it blocked serving up the home image or not.

How long did it last for?

The first one lasted merely 10ish minutes, so I kept working at it, but I started getting blocks that lasted at least 2 hours.

Would it be possible to block all tor exit nodes/similar?

No, I doubt it would have had any effect against traffic routed through (different) Tor exit nodes - my recollection is that it was a simple "we've timed you out" sort of error page and didn't seem very sophisticated.

tofof commented 7 years ago

@anon182739

Also, the robots.txt will still be identifiable unless they want to get revealed by internet archive.

How so? A simple block-everything robots.txt is hardly unique to Admiral.

anon182739 commented 7 years ago

The first one lasted fewer than 10 minutes, another lasted at least 2 hours.

So in other words, it increases on subsequent violations.

I also remember it unsurprisingly caused some sort of failure to be able to load Admiral-protected content on affiliated sites.

How does that work? If you block the servers, won't that have the same effect?

No, I doubt it would have had any effect against traffic routed through (different) Tor exit nodes

You're misunderstanding me. What if you run a script to make sure all Tor exit nodes and open proxies get blacklisted on purpose? It will decrease the appeal of Admiral if webmasters know that anyone using a proxy/VPN can't read their site.

I tried to get banned by force refreshing the page some hundred of times, how many requests did you do?

How so? A simple block-everything robots.txt is hardly unique to Admiral.

It's uncommon, so it helps filtering. How common is blackhole+namecheap+whoisguard+robots+lets encrypt?

anon182739 commented 7 years ago

Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:17 --:--:-- 0curl: (52) Empty reply from server
% Total % Received % Xferd Average Speed Time Time Time Current

Are they already blacklisting, or is it just Tor?

tofof commented 7 years ago

It's uncommon, so it helps filtering. How common is blackhole+namecheap+whoisguard+robots+lets encrypt?

Ok, fair enough, as part of the rest of the pattern it's helpful.

I tried to get banned by force refreshing the page some hundred of times, how many requests did you do?

I got it within a few dozen refreshes of the script at the time, probably within a 15-minute interval or so. Certainly nowhere near 100.

How does that work? If you block the servers, won't that have the same effect?

Yes, it does. If you outright blocked the Admiral servers, some Admiral-protected sites didn't serve any content; others still did. It depended on the bootstrapper that was used.