w3c / activitypub

http://w3c.github.io/activitypub/
Other
1.16k stars 70 forks source link

Documentation/best practices on rejecting inbound activities #424

Open ThisIsMissEm opened 5 months ago

ThisIsMissEm commented 5 months ago

In the recent spam wave, we implemented a patch in mastodon that just silently dropped certain activities from being processed.

This obviously isn't a good approach user experience wise, and it'd be better to send back a rejection reply (much like when you send an email and it can't be accepted)

currently Reject is only used for Follow activities, but I don't think there's a reason it couldn't be used for others given appropriate handling semantics

trwnh commented 5 months ago

earlier related discussion: https://socialhub.activitypub.rocks/t/signaling-side-effects-asynchronously-by-generalizing-accept-reject/125

evanp commented 5 months ago

So, if we know at the time the activity is received that it is unacceptable for some reason (spam, etc.) then we can send a 4xx HTTP code, most likely a 400:

https://www.w3.org/wiki/ActivityPub/Primer/HTTP_status_codes_for_delivery

Bad client requests usually (?) will not be retried by senders.

However, some systems may not know at the time of the HTTP request handling that the activity is not acceptable. For example, it could go into a queue for spam testing using e.g. a naive bayesian filter. In this case, the HTTP result might be 202 Accepted, but the activity is never delivered to the recipients.

Whether to send a rejection notice to the sender is an open question. For some types of rejections, e.g. a Block, the ActivityPub specification explicitly calls out the problems of user safety in revealing Blocks.

I think we should follow the typical mechanism for email that the recipient has a chance to review junk messages, but that the sender does not get a notification.

evanp commented 5 months ago

In discussion, we think a 403 Forbidden code may be useful when the sending actor or server is blocked and not authorized to ever send activities to this inbox. A server that receives a 403 may choose to circuit break further delivery.

A 400 code may be more appropriate when the content is not acceptable for some reason.

nightpool commented 5 months ago

Email has been adopting aggregate domain rejection reporting recently, and some systems do send spam reports for every message manually marked as spam. Personally, I think that having an asynchronous Reject activity with a human-readable error message is the best option for software transparency & end-user experience. Even though it may help some spammers (they can keep trying until they get through), the benefits for users who inadvertently get caught in the spam filter outweigh the small losses to spam filter obfuscation (and such obfuscation isn't really very obfuscated in the first play. After all, with open-source ActivityPub servers spammers can just set up their own captive "test lab" to try attacks against without needing to get explicit confirmation)

evanp commented 5 months ago

I think it makes sense in the case where a sender expects some sort of side-effect from the activity, such as:

These would be good times to send a Reject or even an Accept for the relevant activity -- especially using the target property.

For general delivery of Create activities, sending Reject activities for every bad object may be too noisy -- especially if there's no way for the sending server to know what was wrong with the activity.

I documented this here: https://www.w3.org/wiki/ActivityPub/Primer/Reject_activity#Additional_uses_of_Reject

evanp commented 5 months ago

After all, with open-source ActivityPub servers spammers can just set up their own captive "test lab" to try attacks against without needing to get explicit confirmation)

This doesn't make sense for rejections that are based on training data or user configurations. No spammer can replicate that environment locally.

bobwyman commented 5 months ago

nightpool wrote:

Email has been adopting aggregate domain rejection reporting recently, and some systems do send spam reports for every message manually marked as spam.

Is this a reference to DMARC (Domain-based Message Authentication, Reporting and Conformance) reports? If so, it is important to clarify that DMARC is primarily intended to address the problem of spoofing, not of spamming -- although spammers often spoof.

"DMARC, which stands for “Domain-based Message Authentication, Reporting & Conformance”, is an email authentication, policy, and reporting protocol. It builds on the widely deployed SPF and DKIM protocols, adding linkage to the author (“From:”) domain name, published policies for recipient handling of authentication failures, and reporting from receivers to senders, to improve and monitor protection of the domain from fraudulent email."

nightpool commented 5 months ago

A lot of email providers have proprietary system for it, for example Google Postmaster Tools.

ThisIsMissEm commented 4 months ago

I'm wondering if it'd be possible to do a Reject on multiple activities, or just store the rejects and send them in bulk once an hour or something?

silverpill commented 4 months ago

FEP-6f55 (rendered, pre-draft) proposes Ack and Nack messages for reporting processing results.

I think sending Accept/Reject or Ack/Nack for every incoming message is not desirable. Instead, server may publish reports in a specified location where senders can retrieve them later. I described this mechanism in more detail here: https://socialhub.activitypub.rocks/t/report-errors-in-server-processing/3006/14

nightpool commented 4 months ago

I agree with evan that you should only use Reject for targeted inbound activities that a user would otherwise expect a reply to. I agree that it's not useful in bulk for every incoming message and I think we should use Reject for the case where you explicit want to try and display an error message to the replier. That's why I disagree with the Ack/Nack/"send in bulk" approach—these are really just automated replies from the targeted server and ideally should be shown to users in real times.