Open zotanmew opened 4 weeks ago
pretty sure this is an error in the text of the AP spec. the bit about "ID explicitly specified as the JSON null object" is clearly an error and should be reworded or removed. the correct behavior is to omit the id
entirely, which triggers the "anonymous object" or "blank node" behavior that one would expect.
the section 3.1 text should read something like:
All Objects in [ActivityStreams] should have unique global identifiers. ActivityPub extends this requirement; all objects distributed by the ActivityPub protocol MUST have unique global identifiers, unless they are intentionally transient (short lived activities that are not intended to be able to be looked up, such as some kinds of chat messages or game notifications) or otherwise anonymous objects (an embedded node that is part of its parent context). These unique global identifiers SHOULD be HTTPS URIs for publicly facing content that is intended to be publicly dereferenceable.
These identifiers must fall into one of the following groups: [...]
side note: @id
should always be an IRI or expand to an IRI against the @base
. the use of a string as @id
(with no corresponding @base
) might be fine on the JSON side, but it will cause all triples with that object as the subject to be removed from the output if converted to RDF (since at best it is interpreted as a "relative URI reference", which is not allowed for subjects)
other side note: this came up in #396 as well regarding "partial updates", except for properties instead of ids.
the correct behavior is to omit the id entirely, which triggers the "anonymous object" or "blank node" behavior that one would expect.
This removes the ability to distinguish transient from anonymous objects unless they occur on the top-level (cannot be anonymous). I’m fine with this and in fact felt like transient objects only really make sense on the top-level anyway, but to make sure: is there any reason why this distinction should be preserved considering the current spec revision makes an explicit effort for it?
distinguish transient from anonymous objects
“transient” and “anonymous” are different aspects of the same functionality. a “transient activity” is also an “anonymous object”, because activities are objects, and because the thing that makes the activity transient is that it’s anonymous.
example of a transient activity:
{
“actor”: “https://someone.example”,
“type”: “InGameNotification”,
“content”: “The payload is nearing the checkpoint!”
}
example of embedded anonymous objects for attributedTo and attachment, part of the parent context of the Note:
{
“id”: “https://imageboard.example/19387428939”,
“type”: “Note”,
“attributedTo”: {
“name”: “Anonymous”
},
“content”: “>>19387428935 >>19387428938 take a look y’all”,
“inReplyTo”: [“ https://imageboard.example/19387428935”, “https://imageboard.example/19387428938”]
“attachment”: {
“type”: “Image”,
“name”: “IMG_4634.jpeg”
“url”: {
“href”: “https://imageboard.example/attachments/3847374.jpg”,
“mediaType”: “image/jpeg”,
“width”: 375,
“height”: 667
}
},
“tag”: [
{“type”: “Mention”, “name”: “ >>19387428935”, “href”: “ https://imageboard.example/19387428935”},
{“type”: “Mention”, “name”: “ >>19387428938”, “href”: “ https://imageboard.example/19387428938”}
]
}
a “transient activity” is also an “anonymous object”, because activities are objects, and because the thing that makes the activity transient is that it’s anonymous.
While their effect for receiving servers may usually amount to the same thing, the way current AP spec describes them they are distinct. Anonymous objects are defined as being "part of its parent context" (and thus not able to be looked up on its own), while transient objects are “short lived activities that are not intended to be able to be looked up”.
The described purpose and intent are different and importantly, anonymous objects cannot exist on the top-level, since there is no parent context to be part of. Your example transient activity therefore is not an anonymous object.
The quoted bit above also suggests only activities can be transient, though later on it also refer to general “transient objects”.
If there’s no reason to ever distinguish between them, I’d suggest to further amend the wording to actually merge “transient” into “anonymous” (E.g. allow omitting the id for anonymous objects and then just mention embedded objects and transient activities as examples of anonymous objects)
the way current AP spec describes them they are distinct
the way current AP spec describes them is wrong and misleading. the “id:null” mechanism is invalid should never have been written.
the purpose of the paragraph is to require dereferenceability except in cases where you explicitly don’t want this. in such cases, you leave out the id.
the “id:null” mechanism is invalid should never have been written.
But it was written and provided a distinction between transient and anonymous. This distinction is also the only motivation I can come up with why it was written the way it is in the first place. That’s why I’m asking about whether it is safe to drop the ability to distinguish transient and anonymous objects.
If it is safe to drop, note that your proposed wording still keeps "anonymous" and "transient" distinct in purpose eventhough they’re no longer distinguishable for receivers, thus my suggestion to explicitly merge the description of those categories.
so something like this, then?
All Objects in [ActivityStreams] should have unique global identifiers. ActivityPub extends this requirement; all objects distributed by the ActivityPub protocol MUST have unique global identifiers, unless they are
intentionally transient (short lived activities that are not intended to be able to be looked up, such as some kinds of chat messages or game notifications)not intended to be looked up or referred to. In other words,Theseidentifiers must fall into one of the following groups:
- Publicly dereferencable URIs, such as HTTPS URIs, with their authority belonging to that of their originating server. (Publicly facing content SHOULD use HTTPS URIs).
- An ID
explicitly specified as the JSON null objectthat is explicitly omitted, which implies; for example, an anonymous object (a part of its parent context) or a transient activity (short lived activities that are not intended to be able to be looked up, such as some kinds of chat messages or game notifications) would omit its ID.
(somewhat un/related, but i think the bit about "authority belonging to that of their originating server" also should be changed, since it's not actually logically implied by "unique global identifier" and it makes all objects owned by an HTTPS server instead of by actors. that's a separate issue, though.)
seems good; thx
I think "transient activities" should be removed from the spec. It sounds like "looking up" is the only purpose of an identifier, but identifiers can also used for authentication, authorization, de-duplication of incoming activities and synchronization of collections. There is no good reason for a top-level object to not have an identifier. "Short lived" makes it even more confusing, implying that activities have a duration or a lifetime.
@zotanmew — Please edit your initial post, and code fence each instance of @id
(like `@id`
), so that GitHub user isn't spammed with notifications about this discussion in which they did not choose to participate.
@TallTed I'm told that editing it won't remove the mention, though I'm happy to edit it regardless.
I'd like to test this with JSON-LD parsers to see what the actual behaviour is. I'm particulary interested in if there's any daylight whatsoever between the @id
property and the id
property that would allow this different behaviour for the latter.
The JSON-LD playground does show a null id
value as an error: https://json-ld.org/playground/#startTab=tab-expanded&json-ld=%7B%22%40context%22%3A%22https%3A%2F%2Fwww.w3.org%2Fns%2Factivitystreams%22%2C%22id%22%3Anull%7D
I think we have two possible paths forward:
There are a few other ways that we could represent "anonymous" or "transient" or otherwise unidentified objects:
id
value; leave it undefined.https://www.w3.org/ns/activitystreams#Anonymous
.I think an Erratum is necessary here. Taking out the reference to using null, we could have something like the following:
...all objects distributed by the ActivityPub protocol MUST have unique global identifiers, unless they are intentionally transient or anonymous ([examples]) in which case the identifier MAY be omitted. The identifiers must be a publicly dereferencable URIs, such as HTTPS URIs, with their authority belonging to that of their originating server. (Publicly facing content SHOULD use HTTPS URIs).
Sounds good to me (I'd vastly prefer the erratum option over a deprecation).
We could also add something like this?
Consumers MAY treat a null
value for the id
property as if the property was not defined. Publishers SHOULD NOT use null
for the id
property, as it is not valid JSON-LD.
This gives us a little Postel resilience.
And, honestly, I hate the "MUST unless you don't want to" phrasing. Is it too late to just do this?
...all objects distributed by the ActivityPub protocol SHOULD have unique global identifiers. The identifiers must be a publicly dereferencable URIs, such as HTTPS URIs, with their authority belonging to that of their originating server. (Publicly facing content SHOULD use HTTPS URIs).
Given that most implementations do not do LD processing for most or even all activities, I’d worry people who are aware of that fact might interpret that as it being fine to send null values, so I’d go for a MUST NOT
here, as it makes federation with any implementations that do process activities as JSON-LD impossible when the activity contains such a null @id
.
I hate the "MUST unless you don't want to" phrasing. Is it too late to just
i don't think saying "AS2 says you SHOULD have unique global identifiers; AP extends this such that all objects SHOULD have unique global identifiers." makes sense. the thing is, it's already a SHOULD in AS2. why have the language about "extending the requirement" in that case?
by contrast, the "MUST but MAY" is not simply "you don't want to". that's what a SHOULD is. "SHOULD" is "do this unless you have a reason not to." "MUST but MAY" is "do this in every circumstance, but we have the following enumerated exceptions." in other words, the anti-fulfillment argument changes from "i have a good reason not to" and becomes "i specifically qualify for this exception".
Postel-wise, the behavior we're trying to go for here is "don't do this, ever; and if you're doing this right now, stop it." otherwise, it sounds like the bit about how consumers MAY strip null ids would perhaps be good guidance for the primer, but it's pretty clear that on the spec level we should just go ahead and remove this unfortunate error once we have a WG.
The AP spec states the following:
The JSON-LD spec (version 1.0) states the following:
The JSON-LD spec (version 1.1) states the same thing:
Since these are in conflict, it is not possible to comply with both the JSON-LD specification and the ActivityPub specification simultaneously.
This was noticed as AP implementer Akkoma has recently started federating anonymous objects in accordance with the AP specification (explicit nulls), which has broken federation with implementations performing JSON-LD expansion (for example, Iceshrimp.NET).
Some solutions were proposed in this Akkoma PR thread.