w3c / activitystreams

Activity Streams 2.0
https://www.w3.org/TR/activitystreams-core/
Other
278 stars 62 forks source link

ActivityStreams @context definition is invalid JSON-LD #500

Closed cjslep closed 5 years ago

cjslep commented 5 years ago

Please Indicate One:

Please Describe the Issue:

According to the JSON-LD Context Processing sections, @context definitions must not be recursive (emphasis mine):

If context is a string, it represents a reference to a remote context. We dereference the remote context and replace context with the value of the @context key of the top-level object in the retrieved JSON-LD document. If there's no such key, an invalid remote context has been detected. Otherwise, we process context by recursively using this algorithm ensuring that there is no cyclical reference.

...

Then, for every other key in local context, we update the term definition in result. Since term definitions in a local context may themselves contain terms or compact IRIs, we may need to recurse. When doing so, we must ensure that there is no cyclical dependency, which is an error. After we have processed any term definition dependencies, we update the current term definition, which may be a keyword alias.

However, when fetching the @context located at https://www.w3.org/ns/activitystreams one obtains a @context with a cyclical alias that breaks JSON-LD processing:

The curl call:

curl -H "Accept: application/ld+json; profile=\"https://www.w3.org/ns/activitystreams\"" https://www.w3.org/ns/activitystreams

Obtains (// comment is mine own indicating the cyclic nature):

{
  "@context": {
    "@vocab": "_:",
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "as": "https://www.w3.org/ns/activitystreams#", // << This is cyclical!
    "ldp": "http://www.w3.org/ns/ldp#",
    "id": "@id",
    "type": "@type",
    "Accept": "as:Accept",
    "Activity": "as:Activity",
    "IntransitiveActivity": "as:IntransitiveActivity",
    "Add": "as:Add",
    "Announce": "as:Announce",
    "Application": "as:Application",
    "Arrive": "as:Arrive",
    "Article": "as:Article",
    "Audio": "as:Audio",
    "Block": "as:Block",
    "Collection": "as:Collection",
    "CollectionPage": "as:CollectionPage",
    "Relationship": "as:Relationship",
    "Create": "as:Create",
    "Delete": "as:Delete",
    "Dislike": "as:Dislike",
    "Document": "as:Document",
    "Event": "as:Event",
    "Follow": "as:Follow",
    "Flag": "as:Flag",
    "Group": "as:Group",
    "Ignore": "as:Ignore",
    "Image": "as:Image",
    "Invite": "as:Invite",
    "Join": "as:Join",
    "Leave": "as:Leave",
    "Like": "as:Like",
    "Link": "as:Link",
    "Mention": "as:Mention",
    "Note": "as:Note",
    "Object": "as:Object",
    "Offer": "as:Offer",
    "OrderedCollection": "as:OrderedCollection",
    "OrderedCollectionPage": "as:OrderedCollectionPage",
    "Organization": "as:Organization",
    "Page": "as:Page",
    "Person": "as:Person",
    "Place": "as:Place",
    "Profile": "as:Profile",
    "Question": "as:Question",
    "Reject": "as:Reject",
    "Remove": "as:Remove",
    "Service": "as:Service",
    "TentativeAccept": "as:TentativeAccept",
    "TentativeReject": "as:TentativeReject",
    "Tombstone": "as:Tombstone",
    "Undo": "as:Undo",
    "Update": "as:Update",
    "Video": "as:Video",
    "View": "as:View",
    "Listen": "as:Listen",
    "Read": "as:Read",
    "Move": "as:Move",
    "Travel": "as:Travel",
    "IsFollowing": "as:IsFollowing",
    "IsFollowedBy": "as:IsFollowedBy",
    "IsContact": "as:IsContact",
    "IsMember": "as:IsMember",
    "subject": {
      "@id": "as:subject",
      "@type": "@id"
    },
    "relationship": {
      "@id": "as:relationship",
      "@type": "@id"
    },
    "actor": {
      "@id": "as:actor",
      "@type": "@id"
    },
    "attributedTo": {
      "@id": "as:attributedTo",
      "@type": "@id"
    },
    "attachment": {
      "@id": "as:attachment",
      "@type": "@id"
    },
    "bcc": {
      "@id": "as:bcc",
      "@type": "@id"
    },
    "bto": {
      "@id": "as:bto",
      "@type": "@id"
    },
    "cc": {
      "@id": "as:cc",
      "@type": "@id"
    },
    "context": {
      "@id": "as:context",
      "@type": "@id"
    },
    "current": {
      "@id": "as:current",
      "@type": "@id"
    },
    "first": {
      "@id": "as:first",
      "@type": "@id"
    },
    "generator": {
      "@id": "as:generator",
      "@type": "@id"
    },
    "icon": {
      "@id": "as:icon",
      "@type": "@id"
    },
    "image": {
      "@id": "as:image",
      "@type": "@id"
    },
    "inReplyTo": {
      "@id": "as:inReplyTo",
      "@type": "@id"
    },
    "items": {
      "@id": "as:items",
      "@type": "@id"
    },
    "instrument": {
      "@id": "as:instrument",
      "@type": "@id"
    },
    "orderedItems": {
      "@id": "as:items",
      "@type": "@id",
      "@container": "@list"
    },
    "last": {
      "@id": "as:last",
      "@type": "@id"
    },
    "location": {
      "@id": "as:location",
      "@type": "@id"
    },
    "next": {
      "@id": "as:next",
      "@type": "@id"
    },
    "object": {
      "@id": "as:object",
      "@type": "@id"
    },
    "oneOf": {
      "@id": "as:oneOf",
      "@type": "@id"
    },
    "anyOf": {
      "@id": "as:anyOf",
      "@type": "@id"
    },
    "closed": {
      "@id": "as:closed",
      "@type": "xsd:dateTime"
    },
    "origin": {
      "@id": "as:origin",
      "@type": "@id"
    },
    "accuracy": {
      "@id": "as:accuracy",
      "@type": "xsd:float"
    },
    "prev": {
      "@id": "as:prev",
      "@type": "@id"
    },
    "preview": {
      "@id": "as:preview",
      "@type": "@id"
    },
    "replies": {
      "@id": "as:replies",
      "@type": "@id"
    },
    "result": {
      "@id": "as:result",
      "@type": "@id"
    },
    "audience": {
      "@id": "as:audience",
      "@type": "@id"
    },
    "partOf": {
      "@id": "as:partOf",
      "@type": "@id"
    },
    "tag": {
      "@id": "as:tag",
      "@type": "@id"
    },
    "target": {
      "@id": "as:target",
      "@type": "@id"
    },
    "to": {
      "@id": "as:to",
      "@type": "@id"
    },
    "url": {
      "@id": "as:url",
      "@type": "@id"
    },
    "altitude": {
      "@id": "as:altitude",
      "@type": "xsd:float"
    },
    "content": "as:content",
    "contentMap": {
      "@id": "as:content",
      "@container": "@language"
    },
    "name": "as:name",
    "nameMap": {
      "@id": "as:name",
      "@container": "@language"
    },
    "duration": {
      "@id": "as:duration",
      "@type": "xsd:duration"
    },
    "endTime": {
      "@id": "as:endTime",
      "@type": "xsd:dateTime"
    },
    "height": {
      "@id": "as:height",
      "@type": "xsd:nonNegativeInteger"
    },
    "href": {
      "@id": "as:href",
      "@type": "@id"
    },
    "hreflang": "as:hreflang",
    "latitude": {
      "@id": "as:latitude",
      "@type": "xsd:float"
    },
    "longitude": {
      "@id": "as:longitude",
      "@type": "xsd:float"
    },
    "mediaType": "as:mediaType",
    "published": {
      "@id": "as:published",
      "@type": "xsd:dateTime"
    },
    "radius": {
      "@id": "as:radius",
      "@type": "xsd:float"
    },
    "rel": "as:rel",
    "startIndex": {
      "@id": "as:startIndex",
      "@type": "xsd:nonNegativeInteger"
    },
    "startTime": {
      "@id": "as:startTime",
      "@type": "xsd:dateTime"
    },
    "summary": "as:summary",
    "summaryMap": {
      "@id": "as:summary",
      "@container": "@language"
    },
    "totalItems": {
      "@id": "as:totalItems",
      "@type": "xsd:nonNegativeInteger"
    },
    "units": "as:units",
    "updated": {
      "@id": "as:updated",
      "@type": "xsd:dateTime"
    },
    "width": {
      "@id": "as:width",
      "@type": "xsd:nonNegativeInteger"
    },
    "describes": {
      "@id": "as:describes",
      "@type": "@id"
    },
    "formerType": {
      "@id": "as:formerType",
      "@type": "@id"
    },
    "deleted": {
      "@id": "as:deleted",
      "@type": "xsd:dateTime"
    },
    "inbox": {
      "@id": "ldp:inbox",
      "@type": "@id"
    },
    "outbox": {
      "@id": "as:outbox",
      "@type": "@id"
    },
    "following": {
      "@id": "as:following",
      "@type": "@id"
    },
    "followers": {
      "@id": "as:followers",
      "@type": "@id"
    },
    "streams": {
      "@id": "as:streams",
      "@type": "@id"
    },
    "preferredUsername": "as:preferredUsername",
    "endpoints": {
      "@id": "as:endpoints",
      "@type": "@id"
    },
    "uploadMedia": {
      "@id": "as:uploadMedia",
      "@type": "@id"
    },
    "proxyUrl": {
      "@id": "as:proxyUrl",
      "@type": "@id"
    },
    "liked": {
      "@id": "as:liked",
      "@type": "@id"
    },
    "oauthAuthorizationEndpoint": {
      "@id": "as:oauthAuthorizationEndpoint",
      "@type": "@id"
    },
    "oauthTokenEndpoint": {
      "@id": "as:oauthTokenEndpoint",
      "@type": "@id"
    },
    "provideClientKey": {
      "@id": "as:provideClientKey",
      "@type": "@id"
    },
    "signClientKey": {
      "@id": "as:signClientKey",
      "@type": "@id"
    },
    "sharedInbox": {
      "@id": "as:sharedInbox",
      "@type": "@id"
    },
    "Public": {
      "@id": "as:Public",
      "@type": "@id"
    },
    "source": "as:source",
    "likes": {
      "@id": "as:likes",
      "@type": "@id"
    },
    "shares": {
      "@id": "as:shares",
      "@type": "@id"
    }
  }
}
cjslep commented 5 years ago

Note that this has implications for downstream users of the ActivityStreams IRI in other @context, such as this issue for Mastodon: https://github.com/tootsuite/mastodon/issues/10646

nightpool commented 5 years ago

as I responded in the referenced issue, I don't believe your interpretation of the processing algorithm is correct. can you point to an existing JSON-LD validator that throws this error when interpreting as2 documents?

On Tue, Apr 30, 2019, 3:29 PM Cory J Slep notifications@github.com wrote:

Note that this has implications for downstream users of the ActivityStreams IRI in other @context, such as this issue for Mastodon: tootsuite/mastodon#10646 https://github.com/tootsuite/mastodon/issues/10646

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/w3c/activitystreams/issues/500#issuecomment-488084140, or mute the thread https://github.com/notifications/unsubscribe-auth/AABZCV3TSOOMQ2URUNAU3TDPTCM25ANCNFSM4HJPPUGA .

dariusk commented 5 years ago

You've missed a key clause in the processing algorithm. Emphasis mine:

Then, for every other key in local context, we update the term definition in result. Since term definitions in a local context may themselves contain terms or compact IRIs, we may need to recurse. When doing so, we must ensure that there is no cyclical dependency, which is an error. After we have processed any term definition dependencies, we update the current term definition, which may be a keyword alias.

This immediately follows a discussion of the @context key, so we're talking about "every other key than @context". I interpret this as: all non-@context keys must have no cyclical dependency.

nightpool commented 5 years ago

I think it's important to note that the paragraph quoted is not the normative processing algorithm, which is a list of explicit steps in pseudocode. I would prefer to anchor our discussion in reference JSON-LD implementations to avoid rehashing the exact spec language, but if we really have to we should make sure to reference the actual algorithm.

cjslep commented 5 years ago

@dariusk I did interpret it correctly. The emphasis you made is in reference to the preceding paragraph:

If context is a JSON object, we first update the base IRI, the vocabulary mapping, and the default language by processing three specific keywords: @base, @vocab, and @language. These are handled before any other keys in the local context because they affect how the other keys are processed. Please note that @base is ignored when processing remote contexts.

This means that in the @context object (the very large {"@vocab" ... "shares"} object) only the @base, @vocab, and @language are processed first, and then the section I am quoting about forbidding recursion comes into play, which is handing every term definition outside of the above. Which includes the term "as".

And I knew the "non normative" bit would come back but let's be honest, the entire spec is a train wreck of "everything is labelled non normative" so that makes it a weak counterargument.

As for validators that fail, astool has always failed on it (I use my own @context OWL definition) and Google's Structured Data Testing Tool also chokes on it:

<script type="application/ld+json">
{"@context":"https://www.w3.org/ns/activitystreams","@type":"as:Person"}
</script>

Returns:

There was an invalid type in your JSON-LD.

Highlighting the @context value.

nightpool commented 5 years ago

googling astool doesn't return any relevant results, mind elaborating? By "existing" validators I mean ones that you didn't make yourself.

My tests seem to show that Google's Structured Data Testing Tool doesn't accept string @context values. (changing the activitystreams context out for https://web-payments.org/contexts/security-v1.jsonld produces the same error) are you certain they claim full compatibility with JSON-LD?

nightpool commented 5 years ago

For reference, the JSON-LD playground (using https://github.com/digitalbazaar/jsonld.js, which is a reference implementation of the JSON-LD processing algorithms), processes AS2 documents just fine.

cjslep commented 5 years ago

This parses just fine:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Organization",
  "url": "http://www.example.com",
  "name": "Unlimited Ball Bearings Corp.",
  "contactPoint": {
    "@type": "ContactPoint",
    "telephone": "+1-401-555-1212",
    "contactType": "Customer service"
  }
}
</script>

Which is from Understand how structured data works and includes a table titled "Google Search supports structured data in the following formats, unless documented otherwise:" which includes JSON-LD with an asterisk to indicate "Google recommends using JSON-LD for structured data whenever possible."

nightpool commented 5 years ago

does it work with any context except schema.org?

annando commented 5 years ago

@nightpool nearly this library (https://github.com/digitalbazaar/php-json-ld) we are using in Friendica to compact the input before parsing it. We don't have issues with compacting content from Mastodon.

cwebber commented 5 years ago

You're making a classic mistake, common to many json-ld context examining folks: a namespace and a context are two different things... even if they might be located at the same URL!

    "as": "https://www.w3.org/ns/activitystreams#",

DOES NOT define another context, and it isn't traversed in json-ld context parsing (really! I've written a json-ld parser before). Instead, this defines the namespace or alias in which the rest of the as:foo terms are defined.

cjslep commented 5 years ago

...but a namespace is a type of term in the JSON-LD spec (6.10).

And terms are recurred in the API processing spec.

And a compact IRI is:

A compact IRI has the form of prefix:suffix and is used as a way of expressing an IRI without needing to define separate term definitions for each IRI contained within a common vocabulary identified by prefix.

Which I take as meaning the common vocabulary (suffix) is recurred once at that prefix term definition so that when future uses of that prefix are processed (Ex: encountering as:Person) do not define a separate term definition for that specific subtype (Continuing with ex: trying to define the term https://www.w3.org/ns/activitystreams#Person).

cjslep commented 5 years ago

I see the light now.

In my own tools I had modified the IRI Expansion algorithm to actually expand the IRI vocabulary. But actually the whole algorithm is actually skipped when behaving properly. For my use case, this skipping is actually problematic, and I had forgotten I had modified it. I am sorry.

The key is the setup in step 4 of Create Term Definition:

4) Remove any existing term definition for term in active context.

Later, when 13.2 is hit (due to the @id transformation and term "as" not equaling its value "https://www.w3.org/ns/activitystreams")

13.2) Otherwise, set the IRI mapping of definition to the result of using the IRI Expansion algorithm, passing active context, the value associated with the @id key for value, true for vocab, false for document relative, local context, and defined.

Going through the IRI Expansion algorithm steps:

  1. "If value is a keyword or null..." Nope! (value = "https://www.w3.org/ns/activitystreams")
  2. "If local context is not null, it contains a key that equals value, and..." Nope! ("https://www.w3.org/ns/activitystreams" is not a key in the @context)
  3. "If vocab is true and the active context has a term definition for value..." Nope! (Was deleted in step 4 of Create Term Definition)
  4. "If value contains a colon (:)" Yep! But it is immediately returned since it is matching on the https://.

This means that this skipping happens for any defining of shorthand compact prefixes. Which for the Expansion/Compaction algorithms are fine. But for anything else trying to understand the documents beyond just shuffling around its textual representation, skipping fetching these is not tenable.

I guess the TL;DR lesson for me is that for text-level transformations (Expansion and Compaction), Context Processing & Co algorithms are sufficient. But for semantic understanding (such as native type code generation) Context Processing & Co algorithms are insufficient.

Since it appears I'm firmly alone in the latter camp, I am closing this bug.

nightpool commented 5 years ago

yes, contexts themselves do not provide semantic understanding. they have to be coupled with a schema definition to do so.

On Wed, May 1, 2019 at 4:22 AM Cory J Slep notifications@github.com wrote:

I see the light now.

In my own tools I had modified the IRI Expansion algorithm https://www.w3.org/TR/json-ld-api/#create-term-definition to actually expand the IRI vocabulary. But actually the whole algorithm is actually skipped when behaving properly. For my use case, this skipping is actually problematic, and I had forgotten I had modified it. I am sorry.

The key is the setup in step 4 of Create Term Definition https://www.w3.org/TR/json-ld-api/#create-term-definition:

  1. Remove any existing term definition for term in active context.

Later, when 13.2 is hit (due to the @id transformation and term "as" not equaling its value "https://www.w3.org/ns/activitystreams")

13.2) Otherwise, set the IRI mapping of definition to the result of using the IRI Expansion algorithm, passing active context, the value associated with the @id https://github.com/id key for value, true for vocab, false for document relative, local context, and defined.

Going through the IRI Expansion algorithm steps:

  1. "If value is a keyword or null..." Nope! (value = " https://www.w3.org/ns/activitystreams")
  2. "If local context is not null, it contains a key that equals value, and..." Nope! ("https://www.w3.org/ns/activitystreams" is not a key in the @context)
  3. "If vocab is true and the active context has a term definition for value..." Nope! (Was deleted in step 4 of Create Term Definition)
  4. "If value contains a colon (:)" Yep! But it is immediately returned since it is matching on the https://.

This means that this skipping happens for any defining of shorthand compact prefixes. Which for the Expansion/Compaction algorithms are fine. But for anything else trying to understand the documents beyond just shuffling around its textual representation, skipping fetching these is not tenable.

I guess the TL;DR lesson for me is that for text-level transformations (Expansion and Compaction), Context Processing & Co algorithms are sufficient. But for semantic understanding (such as native type code generation) Context Processing & Co algorithms are insufficient.

Since it appears I'm firmly alone in the latter camp, I am closing this bug.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/activitystreams/issues/500#issuecomment-488233823, or mute the thread https://github.com/notifications/unsubscribe-auth/AABZCV577VI3F3U4GPTFHSTPTFHKZANCNFSM4HJPPUGA .