How should it be interpreted when a URI value is received?

w3c / activitystreams

Activity Streams 2.0

https://www.w3.org/TR/activitystreams-core/

Other

278 stars 62 forks source link

How should it be interpreted when a URI value is received? #498

Open alexcastano opened 5 years ago

alexcastano commented 5 years ago

Please Indicate One:

[ ] Editorial
[X] Question
[X] Feedback
[ ] Blocking Issue
[ ] Non-Blocking Issue

In core specs, 4.2 Link says:

https://www.w3.org/TR/activitystreams-core/#link

For example, all Objects can contain an image property whose value describes a graphical representation of the containing object. This property will typically be used to provide the URL to an image (e.g. JPEG, GIF or PNG) resource that can be displayed to the user.

However, in the image property definition, the range is limited to: Image | Link

My concern is if we have the example 11:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Application",
  "id": "http://example.org/application/123",
  "name": "Exampletron 3000",
  "image": "http://example.org/application/123.png"
}

I don't know if http://example.org/application/123.png is just a Link and I can use it directly, or it is an Image object and that it is its id. In the last case, I should fetch the full information of the object to use the url property:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "id": "http://example.org/application/123.png"
  "type": "Image",
  "name": "Cat Jumping on Wagon",
  "url": [
    {
      "type": "Link",
      "href": "http://example.org/image.jpeg",
      "mediaType": "image/jpeg"
    },
    {
      "type": "Link",
      "href": "http://example.org/image.png",
      "mediaType": "image/png"
    }
  ]
}

I think, but I'm not sure, usually when a URI is provided where an Object or a Link should appear means that is an Object whose id is the URI; ie:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "summary": "Martin created an image",
  "type": "Create",
  "actor": "http://www.test.example/martin",
  "object": "http://example.org/foo.jpg"
}

is equivalent to:

{
 "@context": "https://www.w3.org/ns/activitystreams",
  "summary": "Martin created an image",
  "type": "Create",
  "actor": {
   "type": "Person",
   "id": "http://www.test.example/martin",
   "name": "Martin Smith",
   "url": "http://example.org/martin",
   "image": {
     "type": "Link",
     "href": "http://example.org/martin/image.jpg",
     "mediaType": "image/jpeg"
   }
  },
  "object" : {
   "type": "Image",
    "id": "http://example.org/foo.jpg",
    "url": "http://example.org/foo.jpg"
  }
}

Is this supposition right? Could it mean another thing? Should the implementations have this behavior by default when receiving this kind of messages?

I find confusing this kind of situations where only a URI is received as a value. I'm trying to create an ActivityStreams/ActivityPub generic implementation and this is a stopper in my architecture.

Thank you for your time.

cjslep commented 5 years ago

In general, an IRI for that property should be interpreted like any other valid JSON-LD value for that property. For example, the actor property has a range of Object or Link and is not a functional property (it can have multiple values). Dereferencing should be able to handle receiving a JSON-LD representation of:

A single Object (or another type that extends Object)
A single Link (or another type that extends Link)
Multiple Object (or multiple types that extend Object -- not all must be the same type)
Multiple Link (or multiple types that extend Link -- not all must be the same type)

EDIT: Enter accidentally pressed too early, I'll circle back around to this, to be continued...

cjslep commented 5 years ago

In this case for image, you'll be able to distinguish between the general IRI and specific ActivityStream Link case because either image will literally be an IRI, or it will be a Link ActivityStreams type:

{
  // Is an IRI
  "image": "http://example.com/image.png"
}

{
  // Is a Link
  "image": {
    "type": "Link",
    "url": "http://example.com/image.png"
  }
}

So whenever you get an IRI for anything as a property value, treat it literally as an IRI.

EDIT: See Example 10 in the JSON-LD spec.

Edit 2:

This makes your last example's supposition false:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "summary": "Martin created an image",
  "type": "Create",
  "actor": "http://www.test.example/martin",
  "object": "http://example.org/foo.jpg"
}

Is invalid, because the IRI for object property is not resolving to:

{
 "@context": "https://www.w3.org/ns/activitystreams",
  "summary": "Martin created an image",
  "type": "Create",
  "actor": {
   "type": "Person",
   "id": "http://www.test.example/martin",
   "name": "Martin Smith",
   "url": "http://example.org/martin",
   "image": {
     "type": "Link",
     "href": "http://example.org/martin/image.jpg",
     "mediaType": "image/jpeg"
   }
  },
  "object" : {
   "type": "Image",
    "id": "http://example.org/foo.jpg",
    "url": "http://example.org/foo.jpg"
  }
}

It is instead incorrectly resolving to the image hosted at "http://example.org/foo.jpg".

Edit 3: This comes back full circle to my first (premature) post: When resolving IRIs instead of embedded JSON arrays/objects, you need to be prepared to handle all the possibilities as dictated by the range in the spec.

alexcastano commented 5 years ago

Hello @cjslep, thanks for your response. I'm working on a general-purpose ActivityPub/Stream library too, but in my case using Elixir :) At the same time, I'm creating MoodleNet. Everything is in alpha state: https://gitlab.com/OpenCoop/CommonsPub/Server

You said:

So whenever you get an IRI for anything as a property value, treat it literally as an IRI.

Well, I think it is not that easy. If I receive:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "summary": "Sally followed John",
  "type": "Follow",
  "actor": "http://sally.example.org",
  "object": "http://john.example.org"
}

I need to know that http://sally.example.org and http://john.example.org are both Actors and I have to update the database accordingly, send the new activities as well, etc.

I think (so maybe I'm wrong here :) it is valid to send just the id instead of the full object specification in all the cases. In fact, it is the spec which decides to send just the ID, a smaller set of data of the object, or the full object data.

So, if the above is true, an implementation cannot know beforehand whether the URI is just a URI or an Object ID and it should do a new request to get more information about the object if it's necessary. This happens all the time, I just gave the example of the image because I think it's the easiest to understand.

If my server receives:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Application",
  "id": "http://example.org/application/123",
  "name": "Exampletron 3000",
  "image": "http://example.org/application/123.png"
}

And I want to show a photo of this Application, I mean: <img src="what_should_i_put_here">. An image can be a Link and here https://www.w3.org/TR/activitystreams-core/#link they use the href directly. But also could be an Image, so it could be an ID. In that last case my server should make an GET to that ID and fetch the urls properties of this Image object.

So I think that using an IRI value to represent a Link could be wrong and a full Link specification should be used:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Application",
  "id": "http://example.org/application/123",
  "name": "Exampletron 3000",
  "image": {
     "type": "Link",
     "href": "http://example.org/application/123.png"
  }
}

alexcastano commented 5 years ago

In fact, the image property cannot be just a simple IRI: https://www.w3.org/TR/activitystreams-vocabulary/#dfn-image-term On the other hande, the url property can: https://www.w3.org/TR/activitystreams-vocabulary/#dfn-url

cjslep commented 5 years ago

If my server receives:

Your server won't receive that, because it's not a valid ActivityStreams object

Using your example:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Application",
  "id": "http://example.org/application/123",
  "name": "Exampletron 3000",
  "image": "http://example.org/application/123.png" // <<< This is not valid! It is not permitted to be an image URL!
}

You cannot put the actual image URL as the "image" property. The JSON-LD Spec permits this property to be an IRI (always). Building on that, the ActivityStreams spec gives the image property a range of Image or Link. If you get a link literal for a property that doesn't have xsd:anyURI as its range then it will always be an IRI referencing the @id of another ActivityStreams payload. So you go back to the logic you correctly understand -- making the additional fetch to the server to resolve the question "what is this IRI's actual "type" and how do I interpret it?":

Your server will receive:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Application",
  "id": "http://example.org/application/123",
  "name": "Exampletron 3000",
  "image": "http://example.org/application/myimage" // <<< This is valid, it is referring to an ActivityStreams' @id, but no one knows what its actual "type" is yet.
}

And then when you fetch it from the server you'll get anything specified in the range in the spec (So an Image or Link), so when you do a GET you could receive:

Option 1: Image

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "id": "http://example.org/application/myimage", // <<< This should match what you just fetched from the previous @id
  "type": "Image",
  "name": "Cat Jumping on Wagon",
  "url": "http://example.org/image.jpeg" // <<< Here is the actual image URL!
}

Option 2: Link

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Link",
  "id": "http://example.org/application/myimage", // <<< This should match what you just fetched from the previous @id
  "href": "http://example.org/image.jpeg" // <<< Here is the actual image URL!
}

Do not confuse the JSON-LD concept of IRI, which is a special reference within the linked data structure, and the value of a property which happens to be a URL.

alexcastano commented 5 years ago

First of all, thank you very much to take the time to respond me. I really appreciate it :)

I have to confess that my knowledge of JSON-LD is very limited.

First thing:

Using your example:

It is not my example, it is an ActivityStreams' example. I think this is the source of our confusion. Look at example 11 and 12 here: https://www.w3.org/TR/activitystreams-core/#link

It is said that this is valid, and you're saying it is invalid, so we agree. I just wrote that if If this were valid, we'd have a problem. I think you completely understand my point.

So I think we are saying the same, that the specs are wrong :joy:

alexcastano commented 5 years ago

So the image property can be (with multiple values):

An ID of a Link or an ID of an Image.
The representation of the Link or the Image itself.

In no case could it be the Link.href directly :+1:

cjslep commented 5 years ago

Holy cow, I didn't even notice that. You're totally right, thanks for setting me straight!

nightpool commented 5 years ago

Hey, just to follow up on this:

Figure 11 says:

To reference a single image without any additional metadata, a direct association can be expressed as a JSON string containing an absolute IRI.

And gives the example:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Application",
  "id": "http://example.org/application/123",
  "name": "Exampletron 3000",
  "image": "http://example.org/application/123.png"
}

This is the first example of a Link ever given in the document.

I think it should be definitely understood that this is a valid link, considering that it's the prototypical link in the entire spec

cjslep commented 5 years ago

Yes, but I would be careful with the word 'link' as it is not specific enough.

The earlier confusion being discussed was whether to interpret it as a xsd:anyURI (pointing to anything), a Link (ActivityStream wise, pointing to an AS thing) or a valid IRI (JSON-LD wise, pointing to a JSON-LD thing) if doing "generic" processing (for Alex and the RDF folks at home).

That passage in the spec seems to be a very narrow exception to the general rule, one I had missed, and I thank Alex for being patient and explaining it to me.

alexcastano commented 5 years ago

@nightpool Sorry, but I could not understand what you mean:

I think it should be definitely understood that this is a valid link, considering that it's the prototypical link in the entire spec

If it is valid, how should a generic implementation resolve the dilemma I tried to explain in https://github.com/w3c/activitystreams/issues/498#issuecomment-445337273 ?

What do you mean with prototypical link in the entire spec?

Thank you for your time

gobengo commented 5 years ago

@alexcastano here's how I think of it:

When you see { "image": "http://url.com" }, you're seeing a "reference" (i.e. "Link href") to that resource. That's it. Upon dereferencing that URL (including whatever Content-Type you Accept), you may actually end up at a png, jpg, JSON file, binary blob, or infinite series of redirects. Could be anything. But it's a poitner to a to-be-interrogated "resource" and not an id of an Image or Link itself.

But also could be an Image, so it could be an ID. In that last case my server should make an GET to that ID and fetch the urls properties of this Image object.

Not exactly. And that 'should' is, I'm pretty sure, not aligned with any SHOULD I know of.

If you have { image: "url1" }, you basically have, { type: "Link", href: "url1" }. You do not necessarily have { id: "url1" }, or even something of type Image. It will be totally common to dereference "url1" and get a 404, 500, or some random HTML file, because the web is weird. Now, what's cool is, you can actually still make web apps that just do <img src="url" />, because the end-user's user-agent will lazily dereference that with some kind of Accept: image/* that is most likely to dereference to a resource that can be rendered as an image (but that's no more valid than any other way of serializing the resource at that URL, e.g. a textual/audio description for those who can't see).

So what do you "send"? Your conclusion is actually consistent with what I've pointed out here (even though I'm disagreeing with interpreting "url1" as an id. So if you want to re serialize as a Link object, well that's fine, it's just more characters. Semantically it's the same. Even better, IMO, and what I'd recommend is either leave it exactly as you found it and just forward the original, or attempt to dereference the URL and replace it with some representation of that resource (like Image), but only if you can find something at that URL that has mediaType image/*.

Practically speaking, It might make sense for an AP implementation to explicitly store a map of URL to cached metadata about what can be dereferenced at each URL (whether you can get an AS2 object at that URL, what content types are available, etc). This would make it less costly to frequently re-dereference the same URLs repeatedly, when any given social we object could be link to dozens of other resources. It would also allow you to do this kind of thing when encountering your .image like { image: "url1" }

interpret string as Link-with-href. It's not very useful, so move onto other validating other properties, saving, etc. In rendering the object now, just include an <img src="url1" onerror="removeImage" /> or nothing for now.
BUT also add a task to a job queue to DereferenceUrl { url: "url1" }.
When that job is processed, go request the URL with requests like HEAD, "Accept: /", etc, and save the results as a DereferencedUrl that is totally separate from the original object-with-image you encountered
From now on, when you read the original object with { image: "url1" }, you can also look up (or JOIN to) the DereferencedUrl (without making another HTTP request) and see if the response can be converted to an Image or something (Does that DereferencedUrl respond with 200, support content-type image/*, is there height/width/aspect-ratio metadata we can store). If so, you can replace the Link reference with this object you've created from the DereferencedUrl

alexcastano commented 5 years ago

Hello @gobengo, thanks for taking the time to answer my question.

Not exactly. And that 'should' is, I'm pretty sure, not aligned with any SHOULD I know of.

My knowledge about JSON-LD is very limited, but AFAIK, you can embed an object or you can just refer to it using its ID. So the image property, defined in ActivityStream, has the range Image or Link. This means that an IRI MUST represent the ID of an Image or a Link. It is important to remember that Links have also id property.

@cjslep seems to agree with me in this aspect:

You cannot put the actual image URL as the "image" property. The JSON-LD Spec permits this property to be an IRI (always). Building on that, the ActivityStreams spec gives the image property a range of Image or Link. If you get a link literal for a property that doesn't have xsd:anyURI as its range then it will always be an IRI referencing the @id of another ActivityStreams payload.

So if the above is true (I repeat I'm not an expert on JSON-LD) the following sentence would be wrong:

If you have { image: "url1" }, you basically have, { type: "Link", href: "url1" }

And the specification says the same, so it would be wrong as well. And that is the reason because I open the issue.

To reinforce the idea that we can take a look at the continues examples we see in the specs:

Example 63

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "summary": "Sally offered the Foo object",
  "type": "Offer",
  "actor": "http://sally.example.org",
  "object": "http://example.org/foo"
}

Example 64

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "summary": "Sally offered the Foo object",
  "type": "Offer",
  "actor": {
    "type": "Person",
    "id": "http://sally.example.org",
    "summary": "Sally"
  },
  "object": "http://example.org/foo"
}

The difference between example 63 and 64 is how the actor property is represented: first just the id, the second with the embeded object. So if you cannot substitute an Object by its ID I think the specification should make a good explanation about it, because at least I was confused around this idea with the examples.

On the other hand, if the substitution of the object by its id is allowed, and also, the substitution of a Link by its href is allowed, then I see many issues that can be wrong or make a generic implementation incredible difficult, e.g.: A Link can be substituted by the id and by the href.

Just to complete my response, I like to explain that an Image's ID may or may not return an image when requesting:

{
  "type": "Image",
  "id": "https://example.org/this_is_the_id_url",
  "url": "https://example.org/this_is_where_the_png_resides"
}

So you cannot just put the ID like src of an <img>.

In closing, I just want to emphasize that my solution to this problem is simply to not allow a Link to be replaced by its href.

DeadWisdom commented 1 year ago

After some research, came across this section of the JSON-LD 1.1 Processing Algorithms and API.

If active property has a type mapping in the active context set to @id or @vocab, and the value is a string, a map with a single entry @id whose value is the result of using the IRI Expansion algorithm on value is returned.

Here's the way I'm interpreting all this:

Activity Streams is inconsistent with regards to the referencing things. It really leans on JSON-LD too heavily to not have to explain JSON-LD things like compaction and expansion. But many examples are quite sus. As far as JSON-LD is concerned, strings with an @id term should be expanded to {'id': *value*}. So generally all string values should be considered a naked reference to an Object if the field is defined as @id, or in the documentation as Object | Link, so the receiving server should dereference the object to get the real version, as it should for all object references it receives.

However there are a few exceptions...

The "url" term is annoyingly defined as @id in the actual json-ld context, but in the documentation it's "xsd:anyURI | Link". This is a conflict and the source of our trouble. Further, Activity Pub adds a bunch of links (followers, liked, inbox, etc) and are documented as "links", but defined in the context as @id. Strings for these terms should all be interpreted as simply IRI strings, NOT objects. In essence, it is a bug in the current the ActivityStreams JSON-LD context.

Further, for terms that specify a Document type, which seems to ONLY be "image" --Image | Link, a string could be interpreted as the "url" of the resource, because the image term is really trying to represent a resource on the web.

TL;DR -- When you get a string for Object | Link terms, you can choose to either interpret it as a naked Object with no type, just an id: {"id": value}, or continue to pass it as a IRI. The point is to treat it as opaquely as possible, passing it to your processing logic, which can interpret the reference as it needs to with better context. And especially when documents are involved (like an image) it is reasonable for the processor to interpret it as the URL to a web resource.

p.s.

Interestingly @id string values should technically be expanded according to the active context and so if your object looked like:

{ "@id": "https://example.com/thing/222",
  "@type": "Something",
  "tag": "as:Activity" }

It should be interpreted as:

{ "@id": "https://example.com/thing/222",
  "@type": "Something",
  "tag": "https://www.w3.org/ns/activitystreams#Activity" }

You could use this to your advantage for things like the "tag" term, for sure. However, I doubt a lot of servers handle this as such.

bumblefudge commented 10 months ago

I think the usage of bare id URLs to stand in for whole objects may have been modeled by some examples as an internal representation-- passing objects with bare URLs for images to other servers seems like bad form? Just speculating as to what unspoken assumptions got lost in translation here...

evanp commented 10 months ago

So, in issue triage, we worked on a primer page on this issue:

https://www.w3.org/wiki/Activity_Streams/Primer/URLs_as_values

It includes guidance for publishers (don't bare URLs if it could be confusing), and for consumers (use heuristics to guess between Object ID and Link href, with Link id being a very rare property).

mattheimer commented 5 months ago

In the primer page it says:

In ActivityPub, if the ID is not an HTTPS URL to an AS2 object, assume that this is a Link with url property equal to the string.

I think that was meant to be "a Link with a href property".