w3c / activitypub

http://w3c.github.io/activitypub/
Other
1.2k stars 77 forks source link

Undo activity without an ID on the object activity #384

Closed evanp closed 3 weeks ago

evanp commented 1 year ago

Is it possible to process an Undo activity without an ID on the object activity? For example, to undo a Like activity:

{
   "@context": "https://w3c.org/ns/activitystreams",
   "actor": "https:/example.com/actor/7",
   "type": "Undo",
   "id": "https://example.com/activity/undo/4",
   "object": {
       "type": "Like",
       "object": {
           "id": "https://example.org/note/17",
           "type": "Note"
       }
   }
}

If the likes collection of the object is unique by actor, it is possible to execute the side effect of this activity without needing the id of the Like.

Similarly, for undoing a Follow activity:

{
   "@context": "https://w3c.org/ns/activitystreams",
   "actor": "https:/example.com/actor/19",
   "type": "Undo",
   "id": "https://example.com/activity/undo/5",
   "object": {
       "type": "Follow",
       "object": {
           "id": "https://example.org/actor/256",
           "type": "Person"
       }
   }
}

If the followers collection of the actor that is being unfollowed is unique by id, then it should possible to execute the side effect of removing the Undo activity's actor from the followers collection without needed the exact id of the Follow activity.

dmitrizagidulin commented 1 year ago

I think the spec (or recommendations) should just require IDs, and not allow unfollows by actor/shape.

evanp commented 1 year ago

Accept and Reject may also have the same issue, since you can accept or reject an activity, and that activity may be unique by object id, or actor id, not necessarily by activity id.

trwnh commented 1 year ago

The problem occurs when the object is a Like or a Follow, which has been decided to be idempotent. it's not unique to Undo, as it also happens with Accept/Reject Follow and Undo Accept Follow.

EDIT: it happens for the following "stateful" activities:

evanp commented 1 year ago

@trwnh I think you're trying to suggest that there was a recent decision to make the elements of following, followers, liked, likes, shares, etc. unique? That is not the case; it's implied in the original documents, and most implementations treat them this way. It's good for us to highlight these hidden assumptions and provide guidance on them, but I don't think it's fair to say there was a new decision on this topic.

trwnh commented 1 year ago

I meant with the triage of #381 it was decided to make the implication explicit, at least within the primer. Not necessarily that a new decision was made, but a decision was made at some point, and that decision was clarified during the issue triage yesterday.

Although I have to raise another point of contention: when was it decided that Announce should be idempotent with respect to shares? This is starting to feel like implementation decisions are being raised into the spec itself. For example, if Tumblr decides to implement reblogs as Announce then they will want to support multiple shares most likely. And the original intent with #381 was to clarify cases where a "like" is a kind of post, not a binary signal. This is also why I brought up in #381 that there should be more clarity on the split between a "resource" and a "procedure".

Taking Follow as an example, if we say that following is idempotent and id doesn't matter (because implementers don't want to keep track of the ids of every single Follow and Accept Follow in order to Undo it), then we are basically saying the Follow activity as a resource doesn't matter. All that matters is the procedure. And this sort of makes sense considering how unwieldy it is to handle an activity such as Undo Accept Follow Person, which contains three nested objects and two of the ids don't matter. This led to Mastodon deciding to send a Reject Follow Person instead, where the id of the Follow still doesn't matter. But then you have cases like https://github.com/misskey-dev/misskey/issues/9250 / https://github.com/misskey-dev/misskey/issues/11015 where Misskey assumes that ids are required, so they send a local id because they don't have the original id... and Pleroma thinks that the id is then invalid, despite not really caring for the id of the resource. So either the spec should account for this, by explicitly saying the id of the Follow doesn't need to be stored, and/or that all implementations MUST treat all idempotent actions as null id... or some other activity needs to be defined for doing the same thing. This is again what I meant by describing the "shape" of the activity -- by inlining or embedding the Like or Follow, you don't need an id. Explicitly declaring this would prevent cases like the "remove follower" bug which remains an issue to this day -- Misskey would have no need to send a local id, and Pleroma would have no need to validate the id at all.

silverpill commented 1 year ago

Although I have to raise another point of contention: when was it decided that Announce should be idempotent with respect to shares? This is starting to feel like implementation decisions are being raised into the spec itself. For example, if Tumblr decides to implement reblogs as Announce then they will want to support multiple shares most likely.

@trwnh Idempotency is generally a desirable property in a distributed system. The connection between servers could be flaky and the sender may see a request timeout and retry activity delivery not knowing that recipient already processed it.

I think for multiple shares it would be better to add a property specifying the total number of shares to Announce activity and maybe send Update(Announce) activity after each share.

trwnh commented 1 year ago

@silverpill a primary characteristic of Tumblr style reblogs is that they continue to exist after reblogging again, each with their own timestamp and permalink. In IndieWeb this would be possible due to how u-repost-of and h-cite work. Multiple reposts can be tracked naturally because each repost is a resource with a URI. Update Announce would not work because shares continues to have only one entry.

Idempotency generally happens as a property of having some idempotency key, in this case the id of the activity. The question is whether we should enforce additional spec-wide uniqueness constraints that may or may not have been implied initially -- and in doing so, we need to determine how to deal with the fact that id no longer matters entirely. In the case of Follow, this seems mostly uncontroversial, although it creates a bit of a tricky state machine; in the case of Like, this seems a bit controversial but popular opinion tends to agree, especially due to the existence of the liked collection and how it contains objects instead of activities, implying an intent of uniqueness; in the case of Announce this is a bit more controversial. Already there are UX annoyances around boosting a status multiple times in Mastodon, for example, with having to unboost before boosting again, and the common API error "422 reblog of status already taken" that happens if you do this too quickly.

evanp commented 1 year ago

I think the best resolution for this problem is guidance within the Primer for referring to activities. We can apply the Postel principle and say that for best interoperability, the publisher should include the ID, and the consumer should accept activities without IDs if they can identify them uniquely another way.

I'll add this section to the Primer, and we can consider from there.

https://www.w3.org/wiki/ActivityPub/Primer/Referring_to_activities

We agreed during triage that it does make sense for consumers to make an effort to identify activities with no id if other attributes uniquely identify them. We had some disagreement about whether consumers should identify activities with an id property that hasn't been seen before, possibly because the publisher did not store the id it used originally. So, the guideline there is somewhat weaker.

silverpill commented 1 year ago

@evanp From Primer:

It is helpful to have an id property for these referred activities, but not all systems will store the activity ids over time. This can make it difficult to refer to those activities later.

Any examples of such systems? In my experience, the majority of AP implementations have no problem with including referred activity ID in Undo, Accept and Reject. Yes, it is possible to identify referred activities by shape, but I think this should not be a recommendation.

In particular, this guideline should not be applied to Undo, Accept and Reject:

Consumers should make an effort to process activities without an id if those activities can be identified uniquely by other properties. See below for some ways to identify different types of activities.

Because that would shift the burden from a few buggy implementations to everyone else.

Postel principle is not universal. It should be used carefully as its application leads to protocol decay: https://datatracker.ietf.org/doc/html/draft-thomson-postel-was-wrong-00

silverpill commented 1 year ago

I suggest the following changes to Primer:

evanp commented 1 year ago

@silverpill I think Misskey and Mastodon have come up as examples that don't keep persistent activity IDs for some activities.

In terms of the guidelines, they're helpful tips for what developers should do to maximize interoperability. They're not normative requirements; there's no MUST possible.

If you want to maximize interoperability, the Postel principle is a good one to follow.

I agree that publishers should include IDs for maximum interop, but also that consumers should try to handle ID-less activities. If both sides do some extra work, we can avoid interop failures.

Activity Streams 2.0 does not require an ID for pretty much anything; ActivityPub doesn't require it for referenced activities. In the absence of an ID, the guidelines hold.

evanp commented 1 year ago

@trwnh I'm moving the conversation about multiple Announce activities to another issue.

evanp commented 1 year ago

@silverpill another good example might be for C2S! A client that wants to let a user undo a follow needs to first find the object in following, then scroll through the outbox until they find a matching Follow activity. That might require many, many pages! Given that the server has all the information locally, it seems easier for all parties to leave it up to the server to determine which activity to Undo.

silverpill commented 1 year ago

I think Misskey and Mastodon have come up as examples that don't keep persistent activity IDs for some activities.

@evanp I have data on Mastodon and it uses the same activity ID in Follow and Undo(Follow). It also uses correct ID of remote Follow activity in Accept(Follow). I don't have data on Misskey, but it probably does the right thing too (otherwise someone would have noticed).

Perhaps these implementations didn't store activity IDs in the past? I don't know.

In terms of the guidelines, they're helpful tips for what developers should do to maximize interoperability. They're not normative requirements; there's no MUST possible.

What about "should store IDs of Like, Follow and Announce activities"? The Primer is authoritative and developers will follow its recommendations. Maximizing interoperability at all costs would mean accounting for quirks of 50+ different implementations, this is not sustainable. Reduction of technical debt and interop improvements should go hand in hand.

another good example might be for C2S!

Quite possible. There might be cases where identification by shape makes the most sense. But for Undo/Accept/Reject in S2S context this is not necessary because using IDs is a de-facto standard (even if the spec doesn't require them).

ap-socialhub commented 7 months ago

This issue has been mentioned on SocialHub. There might be relevant details there:

https://socialhub.activitypub.rocks/t/follow-accept-object-identifiers/3851/9

ap-socialhub commented 3 months ago

This issue has been mentioned on SocialHub. There might be relevant details there:

https://socialhub.activitypub.rocks/t/an-fep-for-follow-accept-mechanics/4266/5

evanp commented 1 month ago

I suggest the following changes to Primer:

* State that implementations MUST store IDs of Like, Follow and Announce activities.

* Change guideline to _Consumers **MAY** make an effort to process activities without an id if those activities can be identified uniquely by other properties. See below for some ways to identify different types of activities._

I think that's a fair resolutions. I changed it from "should" to "may". It probably also makes sense in general for us to avoid SHOULD, MUST, MAY, and other RFC 2119 terms in the Primer, since it is helpful guidance, not normative specification.

I'm going to leave this as pending closure until we hear back from @silverpill .

silverpill commented 1 month ago

@evanp Thank you for making this change.

Also +1 for avoiding RFC-2119 key words.

TallTed commented 3 weeks ago

I might suggest a couple of small revisions to more clearly and fully avoid RFC-2119 keywords...

  • State that it is mandatory that implementations store IDs of Like, Follow, and Announce activities.

  • Change guideline to _Consumers CAN make an effort to process activities without an ID if those activities can be identified uniquely by other properties. See below for some ways to identify different types of activities._

(I know, lowercase may wouldn't be strictly RFC-2119, but lowercase keywords are often read by humans as if they were uppercase, so I find it's best to be case-insensitive about avoiding those keywords.)

evanp commented 3 weeks ago

i'll assume that @silverpill is ok with even less normative language of "can", so since objections are resolved, ill close this issue.