w3c / activitystreams

Activity Streams 2.0
https://www.w3.org/TR/activitystreams-core/
Other
282 stars 61 forks source link

Spec does not clarify non-functional natural language values when mapped #443

Open cjslep opened 6 years ago

cjslep commented 6 years ago

Please Indicate One:

Please Describe the Issue:

In https://www.w3.org/TR/activitystreams-core/#naturalLanguageValues the language mapped forms are examplified:

Accordingly, in the JSON serialization, the terms " name", "summary", and "content" represent the JSON string forms; and the terms " nameMap", "summaryMap", and " contentMap" for represent the object forms.

An example provided is Example 22:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Object",
  "nameMap": {
    "en": "This is the title",
    "fr": "C'est le titre",
    "es": "Este es el título"
  }
}

However, according to https://www.w3.org/TR/activitystreams-vocabulary/#properties none of the above properties are marked 'functional': name, summary, and content. Thus, having multiple values for these properties is valid.

Therefore, the following message is within spec:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Object",
  "name": [ "This is the title", "This is another title" ]
}

However, the spec does not describe how this should be handled in map form, if at all. Two options that would handle it include:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Object",
  "nameMap": {
    "en": [ "This is the title", "This is another title" ],
    "fr": [ "C'est le titre", "C'est un autre titre" ],
    "es": [ "Este es el título", "Este es otro título" ]
  }
}

and

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Object",
  "nameMap": [
    {
      "en": "This is the title",
      "fr": "C'est le titre",
      "es": "Este es el título"
    },
    {
      "en": "This is another title",
      "fr": "C'est un autre titre",
      "es": "Este es otro título"
    }
  ]
}

And another implementation could ignore these altogether as being "unhandled", and all three could be able to claim to follow the spec due to the lack of guidance.

sebilasse commented 6 years ago

[ edit, became clear by 437 comment ] I see, referred to this document https://www.w3.org/TR/activitystreams-vocabulary/#dfn-content

When I was initially reading the vocabulary document I was also unaware that e.g. content and contentMap are mutually exclusive and that there is a special "und" property for unavailable languages.

sebilasse commented 6 years ago

@cjslep please also note for content / contentMap - it gets worse :

How should mediaType behave with multiple content ? mediaType is marked functional ! https://www.w3.org/TR/activitystreams-vocabulary/#dfn-mediatype

This does not even allow me to mix e.g. html and markdown content …

gobengo commented 6 years ago

@sebilasse

How should mediaType behave with multiple content ?

Is this what you mean?

{
  "type": "Note",
  "mediaType": "text/plain",
  "content": [
    "<!doctype html>some html",
    "{}",
  ]
}

How do we interpret the multiple values of content?

I do believe this is a good bug. I think the quickest thing we could do is at least to add to that description of 'mediaType'.

If `content` or `contentType` have multiple values, then the meaning of a single `mediaType` value is undefined.

Separately...

or, honestly, just allow Links in the range of Content, so allow

{
"type": "Link",
"href": "data:application/json;charset=utf-8;base64,e30=",
}

And then deprecate 'mediaType' on Objects.

Could use editorial feedback @cwebber

sebilasse commented 6 years ago

@gobengo This is exactly what I meant.

I would go for the "best solution" 😁 If multiple content items are provided, each one should have it's own content encoding and media type.

See e.g. how e.g. JSON Schema spec. deals with it http://json-schema.org/latest/json-schema-validation.html#rfc.section.8.3

The default could be

{
    "content": "foo",
    "encoding": "8bit",
    "mediaType": "text/html"
}

but it could also be an image

{
    "content": "bar",
    "encoding": "base64",
    "mediaType": "image/png"
}

where contentEncoding can be RFC 2045 "7bit" | "8bit" | "binary" | "quoted-printable" | "base64" | ietf-token | x-token


@cjslep fyi: Made JSON Schemas https://github.com/redaktor/ActivityPubSchema

gobengo commented 6 years ago

I think we want the same thing. I think there's work to be done to clarify what the 'range' of 'content' should be. Object is probably fine, but might be weirdly broad. Perhaps an extension should define a Content type and related StringContent. Or take a look at oa:TextualBody

evanp commented 1 year ago

The Vocabulary document does not specify that these properties are "functional", but it does refer to the properties in the singular as part of the definitions. For example,

None of the examples have multiple values for these properties, and there is no guidance on how consumers should handle multiple values here.

I think the resolution for this problem is to include the Functional flag for these properties in the ERRATA, and to document a best practice for dealing with multiple values if found.