Closed Sentynel closed 1 year ago
Thanks! Yep this is a silly bug :D
This appears to be fixed in 0.7.0 - I'm no longer seeing those 400 errors, I can see successful user delete messages in the logs, and the code in question got refactored by kim. Can anyone else verify?
edit: nope, as was probably inevitable, this happened again five minutes after I posted this
Okay, I'm closing this one as actually fixed: the remaining 400 errors I'm seeing are coming from status unpin requests, and I've verified that account deletion requests are processing successfully in both the case that the account in question has an entry in our database and the one where it doesn't.
Turns out this is only mostly fixed. There is an intermittent failure in the case that we receive an ActivityPub delete message for an account we don't currently have in the database. This returns an error that looks like this:
Bad Request: couldn't get requesting account REMOVED:
enrichAccount: error dereferencing account REMOVED:
DereferenceAccountable: error deferencing REMOVED:
remote resource returned HTTP code 410 GONE
AuthenticatePostInbox()
in federatingprotocol.go
makes an AuthenticateFederatedRequest()
call on line 165. In the case that we don't have the remote account already on file, which is what's happening here, it will call out to the remote server. We handle 410 Gone by just returning HTTP 202 on line 172. Assuming we don't get a 410 Gone, then AuthenticateFederatedRequest()
has successfully retrieved the remote account, got the public key, and checked the signature on the request, but then doesn't put it in the database. Then on line 208 of AuthenticatePostInbox()
we make a GetAccountByURI()
call, which finds that the account it wants isn't in the database, calls enrichAccount()
, which calls dereferenceAccountable()
, which ultimately makes the Dereference()
call. But surprise! This time the remote server returns 410 Gone and we fall over.
Conclusion: we're racing against the remote server delivering all its account deleted messages and marking the deleted profile as gone; we have a double fetch issue where we can request the remote profile twice under certain circumstances, and if we do that and we're unlucky with the timing such that the first (which expects 410 Gone as a possibility) succeeds and the second fails, we throw an error.
IMO the best solution here is to have the AuthenticateFederatedRequest()
call fill in the database entry for the account in question. This avoids the double fetch for any case where we get an ActivityPub message about a user we don't already have in the database, improving performance and being more polite to remote servers. Alternative solutions would be:
AuthenticateFederatedRequest()
and skip the GetAccountByURI()
call in that case. This feels messy to me (having AuthenticateFederatedRequest()
actually return the user seems like odd behaviour, and it's not needed in other places that call happens).GetAccountByURI()
as well. This function already handles the case where it has an account in the DB already and a refresh call fails by using the values from the cached account, so I think this can only be reached down the same code path that's happening here. This doesn't remove the double fetch, which feels suboptimal.Thoughts?
This sounds very similar to #974 if I'm not mistaken?
Yeah, it's the same underlying problem, though it's been through a few refactors and I'm not sure if it got entirely fixed at one point and then came back or it's only been present in edge cases since then. The remaining instance of it here is a fiddly race condition which happens.. more often than I'd have expected, perhaps, but not on every delete by any means.
Handle the 410 Gone explicitly for the call to GetAccountByURI() as well.
I think this would be my preferred solution in this case :) If the enrich call fails with 410 gone we should probably just stop trying to enrich it. I think we even already have some error processing in the transport that can handle 410 gone, so it wouldn't take too much code changery I think....
(also, just wanna say thank you again for doing such an exhaustive and painstaking investigation!)
Describe the bug with a clear and concise description of what the bug is.
ActivityPub user delete messages cause 400 errors due to trying to webfinger the deleted user. Example request:
Error response (note also that this is contained in the HTML error template, but should probably be the JSON one for the API?).
We probably still need to do this request to verify the user is actually gone (since we can't check the message signature unless we already have their key), but 410 is the expected result.
What's your GoToSocial Version?
v0.6.0
GoToSocial Arch
No response
Browser version
No response
What happened?
No response
What you expected to happen?
No response
How to reproduce it?
No response
Anything else we need to know?
No response