w3c / activitypub

http://w3c.github.io/activitypub/
Other
1.25k stars 78 forks source link

Add & Remove activities cannot be used to manage caches of collections #465

Open ThisIsMissEm opened 3 months ago

ThisIsMissEm commented 3 months ago

Currently

the server SHOULD add the object to the collection specified in the target property, unless:

  • the target is not owned by the receiving server, and thus they can't update it.
  • the object is not allowed to be added to the target collection for some other reason, at the receiver's discretion.

The first bullet point essentially prohibits my server from sending your server a Add activity to notify your server that I've added an actor on your server to my Collection, since the target is a collection on my server your server cannot modify it.

I think there needs to be an additional bullet point supporting the caching of collections use case:

  • the target is owned by the actor of the activity, in which case the receiving server can treat it as a change to that collection.

But then Collections also currently don't contain a backlink to the Actor who manages or owns them, see: https://github.com/w3c/activitypub/issues/466.

ThisIsMissEm commented 3 months ago

This is potentially also related to #461

trwnh commented 3 months ago

The first bullet point essentially prohibits my server from sending your server a Add activity to notify your server that I've added an actor on your server to my Collection, since the target is a collection on my server your server cannot modify it.

I think this is a misunderstanding of the first bullet point. It does not prohibit anything. It is true that "your server cannot modify [a collection on my server]", generally speaking, but I can still send a notification that I Added something to my collection.

Collections also currently don't contain a backlink to the Actor who manages or owns them.

In the currently deployed ecosystem, this would probably be done by adding attributedTo to the Collection.

evanp commented 3 months ago

In the currently deployed ecosystem, this would probably be done by adding attributedTo to the Collection.

I think this is the right way to do it.

trwnh commented 3 months ago

tangent 1: https://www.w3.org/wiki/ActivityPub/Primer/Server-Managed_Collections could be signalled by not having an attributedTo property.


tangent 2: https://w3id.org/fep/c7d3 tries to formalize the use of attributedTo, actor, and same-origin policy


tangent: there are other potential values of attributedTo other than an actor; for example, one might declare that an object's replies collection is attributedTo the object, and/or that the object is attributedTo the Create activity that created it, and/or that some activity is attributedTo an activity that caused it or resulted in it. https://www.w3.org/TR/activitystreams-vocabulary/#dfn-attributedto says the following:

Identifies one or more entities to which this object is attributed. The attributed entities might not be Actors. For instance, an object might be attributed to the completion of another activity.

which is a bit vague and also circular, in that "attributedTo" is defined using the term "attributed". oxford defines "attribute" like so:

regard something as being caused by (someone or something).

erincandescent also writes about issues with attributedTo here: https://socialhub.activitypub.rocks/t/fep-0391-special-collection-proofs/4165/15

long story short, if you want something that works right now, put attributedTo on the Collection pointing to the actor, since this matches current models as described by FEP-c7d3 (and consider disentangling various senses of the term attributedTo into other separate properties). again, for now, this signals the actor that has permission to manage the collection (and if missing or not pointing to an actor, then it is likely a "server-managed collection" as mentioned above)

i'll also make a separate issue for attributedTo and its problems. EDIT: https://github.com/w3c/activitypub/issues/467

ThisIsMissEm commented 3 months ago

We do clarify the usage of Add Activities with regards to Caching in the Primer: https://www.w3.org/wiki/ActivityPub/Primer/Add_activity

If the receiving server is keeping a cached version of the remote collection, it has two possible actions to take when an Add is received:

  • It can update the cached version of the collection. However, it may not have the correct ordering, since the ordering of new items in collections is not well-defined.
  • It can invalidate the cached version, and re-fetch it from originating server if it is requested in the future.

My reading of the ActivityPub spec, as it stands today is that the unless implies "and" not "or", though I still think explicitly calling out that an Add activity may mean "Invalidate your cache of this Collection" would make sense in the context of the spec.

ThisIsMissEm commented 3 months ago

To keep this issue focused, I should've probably noted that tangent as a separate issue: https://github.com/w3c/activitypub/issues/466

Perhaps an idea to move comments relating to that to this issue & hide them here as "off topic"

trwnh commented 3 months ago

My reading of the ActivityPub spec, as it stands today is that the unless implies "and" not "or"

i read it as an or. but also, i didn't read it as causing any issues for caching use cases.

explicitly calling out that an Add activity may mean "Invalidate your cache of this Collection" would make sense in the context of the spec.

this is in the primer page you linked, but i don't know if any of that should propagate back to the official specification document. prior cases of proposed clarification have been left somewhat open-ended on whether it makes sense to incorporate non-normative notes into an updated editor's draft.

evanp commented 3 months ago

The first bullet point essentially prohibits my server from sending your server a Add activity to notify your server that I've added an actor on your server to my Collection, since the target is a collection on my server your server cannot modify it.

I think it's probably fair to say that updating your cached copy of a collection does not count as updating the collection itself.

I note in my book that it's hard to maintain a synchronized copy of the collection, unless you know the sorting order and uniqueness requirements. Add doesn't specify where in the collection the object is added (start, end, somewhere in the middle), but with collections that are ordered reverse-chronologically you can make a reasonable guess. For Remove, with collections that only allow an object to be included one time, you can make a reasonable guess, but collections that allow duplicates you don't know which one was removed.

I recommend not trying to keep a copy of the collection, but using Add and Remove to a) invalidate caches and b) keep track of membership. When you get an Add activity, you don't necessarily know where in the collection the object is, but you know that it's there.

ThisIsMissEm commented 3 months ago

I'm not suggesting keeping copies of collections, but that caching is important, and very often we just want to communicate membership in a collection. I'd probably say Add and Remove do have different semantics if OrderedCollection or just Collection.

But I do think the way ActivityPub is currently written doesn't account for the caching of collections, or at least, if it does it is not explicit enough and that is likely to cause implementation issues.

I would perhaps suggest for Collections that can contain duplicates, that having a Invalid activity or just an Update activity targeting the collection would be enough to trigger cache invalidation.

steve-bate commented 3 months ago

AFAICT, ActivityPub doesn't account for caching of anything. At least I don't see the word cache anywhere in the specification. The S2S Update activity does mention object copy twice, which is probably caching, but it's not explicitly called out that way.

trwnh commented 3 months ago

AFAICT, ActivityPub doesn't account for caching of anything. At least I don't see the word cache anywhere in the specification. The S2S Update activity does mention object copy twice, which is probably caching, but it's not explicitly called out that way.

there are recommendations along the line of "http caching headers SHOULD be used/respected", iirc

evanp commented 2 months ago

We do mention a "local representation" of objects in the discussion of the Create activity. It is probably worthwhile to have either a clarifying here or a quick comment that suggests how to manage an Add activity w/r/t your cache.

trwnh commented 2 months ago

Possibly related to #407 as well, I am just realizing

ThisIsMissEm commented 2 months ago

@trwnh yeah, I'd possibly even go as far as saying it's a near duplicate.. I'm not sure why I didn't find that issue when I opened this one (I'm pretty sure I searched)