superseriousbusiness / gotosocial

Fast, fun, small ActivityPub server.
https://docs.gotosocial.org
GNU Affero General Public License v3.0
3.76k stars 322 forks source link

[feature] reverse DNS checks? #2497

Closed mirabilos closed 9 months ago

mirabilos commented 9 months ago

Is your feature request related to a problem ?

I’ve seen a specific kind of misbehaviour from OVH (of all things… like in eMail) and am wondering whether we could probably apply a common mitigation in eMail, or one from the https world, or some kind of variant thereof.

The problem I’m seeing is about two dozen requests per second, sometimes to the same path, sometimes to others (like enumerating all my posts), from an OVH IP address, with (usually) a User-Agent saying Mastodon but with a domain they most certainly are not. (In some cases, even two different ones.)

For example: (I’m shamelessly posting this in full here as it serves as defence against being attacked)

51.89.179.74 - - [03/Jan/2024:20:28:48 +0000] TLSv1.3:TLS_AES_256_GCM_SHA384:IPv4"toot.mirbsd.org" "GET /users/mirabilos/main-key HTTP/1.1" 429 30 "-" "http.rb/5.1.1 (Mastodon/4.2.0; +https://universeodon.com/)"
51.89.179.74 - - [05/Jan/2024:20:33:54 +0000] TLSv1.3:TLS_AES_256_GCM_SHA384:IPv4"toot.mirbsd.org" "GET /users/mirabilos/statuses/01HJ955MCT6CGP7JTGPWDJ4GBW/replies HTTP/1.1" 429 30 "-" "http.rb/5.1.1 (Mastodon/4.1.8; +https://mastodonapp.uk/)"

These two and a couple other large Mastodon instances are common.

Describe the solution you'd like.

GtS of course 429s them, but this is still annoying.

I was thinking on requiring something to check the domain they are pretending to be.

In DNS, it’s common for there to be at least a reverse DNS check:

Unfortunately, this alone would not help with some of the OVH boxen as they have matching, if automatically enumerated, revDNS.

But then there comes in the https world where servers generally have to have the hostname they’re using in the SSL certificate they present.

Can something like these or based on these be done in GtS? Is this even possible in AP-based Fedi?

Is it even possible for GtS to do a quick check for these and return a response (so Apache httpd doesn’t complain) that’s using less resources?

Or could GtS maybe keep such attackers in… idk, maybe a database table, and call out to an admin-provided script that adds each newly occurring IP to a 24-hour (or so) firewall-level block?

Describe alternatives you've considered.

I could probably run fail2ban on the Apache httpd access_log and let it check for too many 429s per time unit (some remote instances legit generate a small amount occasionally) and block based on that.

If you think this is sensible, I’ll just do that instead, but I wanted to push the thought-ball around a bit first.

Additional context.

No response

tsmethurst commented 9 months ago

Hmmm. Well we already validate HTTP signatures. If those requests aren't signed with a valid signature, with a key we can fetch from the domain the request is signed from, they will get a 401 back. That's sort of the fedi equivalent of what you're describing, right? I think the other functionality you're describing -- putting the requester in "jail" for 24hrs -- could also be handled by dedicated software like fail2ban, if you tune it just right.

EDIT: Oh sorry my brain totally skipped over the last part of your message. Indeed I think fail2ban is probably the way to go here, so we don't reinvent the wheel in GtS. We should probably write a little bit in the docs for how to tune fail2ban for such cases though.

mirabilos commented 9 months ago

tobi dixit:

Hmmm. Well we already validate HTTP signatures. If those requests aren't signed with a valid signature, with a key we can fetch from the domain the request is signed from, they will get a 401 back.

Hmmhmmhmm. So then perhaps they sign with a domain they have control over but put Universeodon etc. into the User-Agent to fool the casual sysadmin?

Oh, or perhaps the 429 overrides the 401.

Oh or hmm…

51.89.179.74 - - [05/Jan/2024:22:26:13 +0000] TLSv1.3:TLS_AES_256_GCM_SHA384:IPv4"toot.mirbsd.org" "GET /users/mirabilos/statuses/01HKDVBTZRB8YGSTPGJCKDCX6T/replies HTTP/1.1" 200 216 "-" "http.rb/5.1.1 (Mastodon/4.1.8; +https://mastodonapp.uk/)"

timestamp="2024-01-05T22:26:13.860Z" func=server.glob..func1.Logger.func13.1 level=INFO latency="430.402µs" userAgent="http.rb/5.1.1 (Mastodon/4.1.8; +https://mastodonapp.uk/)" method=GET statusCode=200 path=/users/mirabilos/statuses/01HKDVBTZRB8YGSTPGJCKDCX6T/replies requestID=wfkbqpwc04000skw60gg msg="OK: wrote 216B"

… from the IP in question. (I think I need to re-enable IP logging in GtS, for a while; I have now put them both on tmpfs anyway.)

Is there any way to see which domain they use?

I think the other functionality you're describing -- putting the requester in "jail" for 24hrs -- could also be handled by dedicated software like fail2ban, if you tune it just right.

OK, thanks.

bye, //mirabilos

tsmethurst commented 9 months ago

Is there any way to see which domain they use?

Hmm, no, not currently. I mean, GtS knows, but it's not exposed. It would be good to add that to the logging!

So they're running an implementation somewhere that can provide valid signatures in response to requests, but seemingly disguising their user-agent to try to scrape posts under the radar? That's annoying. Probably a good idea to check what 51.89.179.74 actually is, and just chuck it in the bin if it's not legit.

Unfortunately beyond that -- and perhaps adding some heuristics to give admins a warning when this kind of thing seems to be occurring -- there's not too much we can do about it. Trying to parse user agents and check IP addresses would introduce an impossible amount of headaches and likely a lot of false positives.

Alternatively, you could run in allowlist mode and only add instances you trust; then requests from unknown instances (no matter what they set their user-agent to) will be denied.

tsmethurst commented 9 months ago

Oh, just to add a note here in case anyone reading this gets the wrong idea: blocklist mode (the default federation mode) will always permit requests to Public and Unlisted posts provided they're correctly signed by an unblocked domain; that's the nature of blocklist / open federation. It's similar to how you can always see someone's Public posts on the web view.

It's annoying behavior for someone to disguise their user-agent and make signatures from a different domain, but as long as they're valid signatures, GtS will serve a response. The same is true of every other fedi software running in blocklist mode (the default for the vast majority (all?) of them). The only way to prevent this behavior is to use allowlist mode, or block the domain or IP address they're actually signing requests from. In any case, your followers-only and direct visibility posts will not be exposed by this trick.

mirabilos commented 9 months ago

tobi dixit:

Is there any way to see which domain they use?

Hmm, no, not currently.

Hmmh. Might be useful to have that in logs.

So they're running an implementation somewhere that can provide valid signatures in response to requests, but seemingly disguising their user-agent to try to scrape posts under the radar? That's annoying.

That, and hammering instances.

Probably a good idea to check what 51.89.179.74 actually is, and just chuck it in the bin if it's not legit.

It presents as some kind of VPS control panel.

If it’s hosted at OVH, chances are it’s not legit ☻ @natureshadow once calculated that OVH is a worse spammer than all of China, and I have long blocked all of OVH’s netblocks on my eMail server with a few manual exceptions (of people who told me their IP on IRC). Seems like this extends to the Fediverse ☹

I’ll block them, and do something with fail2ban, but I was wanting to see if there’s something that could be done to make their lives harder as I suspect they’re doing that to many instances.

Unfortunately beyond that -- and perhaps adding some heuristics to give admins a warning when this kind of thing seems to be occurring -- there's not too much we can do about it. Trying to parse user agents and check IP addresses would introduce an impossible amount of headaches and likely a lot of false positives.

Yeah, definitely. Ad-hōc clients also tend to not set user agents.

I think the high 429 rate is good enough of a heuristic. If they’re going to scrape they’ve got to do it at a normal rate, at least ;-)

It’s been only a handful of IPs so far, so I’ll manage.

(I wonder whether there are people running legit instances out of OVH, but probably, given how cheap they are… so a /16 block isn’t possible here, but fail2ban will probably do.)

Alternatively, you could run in [allowlist mode](…

That’s like shooting with cannons at birds… overkill in this scenario.

Thanks, //mirabilos

tsmethurst commented 9 months ago

Mmm. For now I'll make a separate issue for showing which domain signed a request. We can put that information right next to the user-agent. That will at least help people spot this kind of silliness more easily in their logs.

mirabilos commented 9 months ago

Thanks, subscribed.

I think we can close this one?

If I work out fail2ban rules, I could do a documentation submission with that (though someone would then need to do an nginx adaption as I use Apache httpd), if you’re interested?

tsmethurst commented 9 months ago

Yes, interested! Thanks!

tsmethurst commented 9 months ago

I looked some more and I think this is a false positive. I've got mastodonapp.uk domain blocked from my instance for a while now, and if I grep IP address 51.89.179.74 in my logs, I see those requests from mastodonapp.uk being blocked correctly, while the ones from universeodon go through correctly.

If you look at the about page for each of those sites, they're hosted in the same data center. So apparently it's "normal" (albeit slightly wonky) that they share the same IP address.

The rate limiting they run into may be explained by the fact that they're both quite big instances, hence push out a lot of traffic, and they get put in the same rate limit bucket due to the shared IP address.

mirabilos commented 9 months ago

tobi dixit:

If you look at the about page for each of those sites, they're hosted in the same data center. So apparently it's "normal" (albeit slightly wonky) that they share the same IP address.

Maybe, but:

$ host mastodonapp.uk mastodonapp.uk has address 104.21.73.178 mastodonapp.uk has address 172.67.147.92 mastodonapp.uk has IPv6 address 2606:4700:3030::ac43:935c mastodonapp.uk has IPv6 address 2606:4700:3036::6815:49b2 mastodonapp.uk mail is handled by 0 mastodonapp-uk.mail.protection.outlook.com. $ host universeodon.com universeodon.com has address 188.114.97.3 universeodon.com has address 188.114.96.3 universeodon.com has IPv6 address 2a06:98c1:3121::3 universeodon.com has IPv6 address 2a06:98c1:3120::3 universeodon.com mail is handled by 5 universeodon-com.mail.protection.outlook.com.

Neither of these match that, and the IP address used for these requests smells fishy in multiple ways.

Oh well, if you manage to squeeze in key/domain logging somewhen, we’ll know, and until then, I can hold out. All we can now is speculate. Plus I got the bigger VM now, so it isn’t a problem in operation as it was some days ago.

The rate limiting they run into may be explained by the fact that they're both quite big instances, hence push out a lot of traffic, and they get put in the same rate limit bucket due to the shared IP address.

Perhaps, if it is legit traffic.

Some days ago I saw them request /users/mirabilos about two dozen times a second, and when I wrote this issue, I saw them request consecutive statūs I posted, also at a rate of about two dozen per second. I would hesitate calling that legit traffic.

They also don’t back off when served 429.

bye, //mirabilos