Refactoring inbox processing to smaller tasks

osmaa commented 9 months ago

(I sent a PR with the first commit separetely, but this builds on top and uses logging for providing diagnostics)

Check HTTP signature ahead of LD signature, and only strip the LD sig away if it doesn't verify.

This fixes random rejects of incoming messages, because LD Signatures just don't always check correctly due to mysterious differences in the JSON->RDF translation

Do not fetch_actor in the Inbox POST path, because sometimes that's just really, really slow and can DoS the web server

This also means that we can't verify the signature of the first ActivityPub message we see from a new actor - because the key isn't yet known. Stator will fetch it in the second stage of processing and it'll be available for further messages.

Without this change, my server was sporadically bogged down by storms of messages from blocked domains, because fetch_actor was called before checking for domain blocks.

Fetch pinned posts in a decoupled internal message instead of fetch_actor

also because it slowed fetch_actor down a lot, and frequently caused database integrity errors as multiple threads were trying to insert the same data

andrewgodwin commented 9 months ago

Seems like a decent approach - could you rebase onto main now I took in the other PR so the diff is easier to read?

osmaa commented 9 months ago

Rebased. Probably fixes #642

osmaa commented 9 months ago

Sorry for the test spam there. I don't have a working test environment locally..

The test fail in signatures confuses me. The code should still raise VerificationError on an invalid signature - what has changed there is that:

if the signature is valid, it's not stripped out of the document (original code popped the sig out of document passed by reference)
the VerificationError is caught in activitypub.Inbox.post and dealt with not by returning HttpResponseUnauthorized, but simply by stripping out the invalid signature, which is encountered frequently enough in the wild for reasons I can't fully track down (apparently are related to canonicalization and/or the normalized hash, though)

Reverting the little changes I made to core/signatures.py anyway..

osmaa commented 9 months ago

Can't see why the signature test continues to fail.

andrewgodwin commented 9 months ago

The signature tests are quite picky but it's also for a good reason (the signing code took a lot of work to get accurate to how other servers wanted to do things). I don't have the capacity to debug this locally for a while, sadly, so if you get a chance to that might help.

osmaa commented 9 months ago

The reason I started looking into the signatures at all was that I was seeing daily failures in LD signature verification from messages I would see Mastodon servers having accepted. I spent a lot of time trying to decipher why -- the only thing I could find out was that sometimes sample JSON would show up with different RDF encoding to what pyld would produce, but the whole process is very opaque and I could get no explanation to that at all.

But in the process I also found out that Mastodon doesn't recommend implementing LD Signatures at all, because their currently implemented version is obsolete, and that Mastodon only refuses to forward LD sigs which don't check out, but doesn't refuse the message itself.

Which is what I did here as well:

check HTTP sig first
don't reject messages for LD Signature fail
DO keep the LD sig on documents where the verification succeeds (this is still the majority of LD signed messages)

I only noticed now that while the signature check still works for correct LD Sigs, the test case for a manipulated message doesn't raise an exception as the test expects. Except... the implementation still raises an exception in the same cases as before, and running the code in the wild, I see messages in my logs about LD signatures being stripped of messages when they don't check out.

At first blush, it would seem to me the test is broken. Except.. I didn't change the test either, and it must have worked before.

I know, I should have a working local test environment and having run the tests before making any changes - or reproduce them without this patch. I haven't gotten around to setting up an env with postgres available for the tests, though :(

osmaa commented 9 months ago

It is the test that is busted. The test document:

{
        "id": "https://example.com/test-create",
        "type": "Create",
        "actor": "https://example.com/test-actor",
        "object": {"id": "https://example.com/test-object", "type": "Note"},
        "signature": {
            "@context": "https://w3id.org/identity/v1",
            "creator": "https://example.com/test-actor#test-key",
            "created": "2022-11-12T21:41:47Z",
            "signatureValue": "nTHfkHqG4hegfnjpHucXtXDLDaIKi2Duk+NeCzqTtkjf4NneXsofbZY2tGew4uAooEe1UeM23PIyjWYnR16KwcD4YY8nMj8L3xY2czwQPScMM9n+KhSHzkWfX+iI4FWKbjpPI8M53EtTRJU+1qEjjmGUx03Ip0vfvT5821etIgvY4wLNhg3y7R8fevnNux+BeytcEV6gM4awJJ6RK0xrWGLyTgDNon5V5aNUjwcV/UVPy9UAQi1KYWtA74/F0Y4oPzL5CTudPpyiViyVHZQaal4r+ExzgSvGztqKxQeT1ya6gLXxbm1YQ+8UiGVSS8zoGhMFDEZWVsRPv7e0jm5wfA==",
            "type": "RsaSignature2017",
        },
    }

has no @context for the top level, so jsonld.normalize will turn it into an empty string instead of the expected n-quads. The normalized_hash for it will be the same before and after the "evil" modification.. Why did it work before and what else has broken it? There were some changes to the builtin_document_loader contexts before, but I can't see how that could have changed anything here: the test doc just has no context to begin with.

andrewgodwin commented 8 months ago

Ahh, nice catch. I had no idea Mastodon had deprecated LD sigs - I wonder what the situation is like for other servers.

osmaa commented 8 months ago

My understanding is that Mastodon servers with AUTHORIZED_FETCH enabled won't produce LD Sigs at all - because the idea of the feature is to require subscribers to authenticate with the source server to see the content, and removing LD Sigs creates plausible deniability over the content. https://docs.joinmastodon.org/admin/config/#authorized_fetch

I had a quick peek at Pixelfed and Firefish source. Pixelfed doesn't appear to contain LD Signature support at all, and Firefish apparently only attaches LD Sigs to messages in the relay service. Makes sense, actually.

andrewgodwin commented 8 months ago

Yeah, that does make sense. The ones I'd be mostly thinking that did this would be gotosocial and pleroma; they tend to be the most prevalent in terms of content I've seen.

osmaa commented 8 months ago

I found only HTTP signature implementation in both gotosocial and pleroma, no sign of LD sig in either. And for whatever it's worth, now that (valid) LD sigs are retained on the inbox messages, scanning that table (sorry for the pretty nasty SQL here), I only see LD sigs on Mastodon and Hometown servers (the latter being a fork of Mastodon). The invalid LD sigs are logged, but that being a text file I didn't bother checking all of them for software type. On a spot check, I see a several of those log messages caused by the same Mastodon servers I see valid sigs from -- that being an indication that LD Sigs aren't a stable thing due to something in the normalization/RDF translation process.

select t1.domain,software,signatures from 
(select split_part(actor, '/',3) domain,count(signature) signatures from 
  (select id, message->>'actor' actor , message->'signature'->>'type' signature from users_inboxmessage ) a 
 where signature='RsaSignature2017' group by 1) t1 
join 
(select domain,nodeinfo->'software'->>'name' software from users_domain) t2 
on (t1.domain=t2.domain) 
order by signatures desc;

andrewgodwin commented 8 months ago

Great, thanks - sounds like there's really not much point continuing to verify or implement them, really...

jointakahe / takahe

Refactoring inbox processing to smaller tasks #647