w3c / json-ld-syntax

JSON-LD 1.1 Specification
https://w3c.github.io/json-ld-syntax/
Other
112 stars 22 forks source link

HTTP parameters for specifying context or frame #8

Closed gkellogg closed 5 years ago

gkellogg commented 6 years ago

When requesting JSON-LD from an HTTP endpoint, it would be useful to provide a reference to a context or frame which should be used by the server to put the results into the proper format.

RubenVerborgh commented 6 years ago

We should align this with the work in the W3C DXWG (e.g., https://github.com/w3c/dxwg/issues/261)

ajs6f commented 6 years ago

Six kinds of +1. In the field this issue comes up for me at least every three months or so! :grin: The potential for DOS is real, but the need is pressing.

azaroth42 commented 6 years ago

Propose to discuss this at TPAC in a joint session with DXWG.

ajs6f commented 6 years ago

@azaroth42 Is there a DXWG issue to which we can link this?

azaroth42 commented 6 years ago

https://github.com/w3c/dxwg/issues/261 :)

iherman commented 5 years ago

This issue was discussed in a meeting.

gkellogg commented 5 years ago

Something to resolve:

If the ACCEPT header includes a json-ld media type with a context or frame profile that the server won't support (e.g., black/white list) I presume it would skip that request and go to the next highest-priority media type it supports (possibly application/ld+json without a profile or without a specific context/frame) returning status 415 if none found.

This could mean not returning any JSON-LD if profile cannot be satisfied.

Alternatively, the server should satisfy the request as best it can without consideration of the profile parameter.

ajs6f commented 5 years ago

I'm not sure if I'm getting at your point, @gkellogg, but my understanding of HTTP (based on the old RFC and possibly out of date) is that the mediatype alone is what Accept selects on-- the profile can't change that choice. So once application/ld+json is the choice, whatever happens with the profile can't change that, and therefore we can select from various options, but tossing to the next mediatype should be the last. But I welcome correction!

Conal-Tuohy commented 5 years ago

I think it's correct that the server is supposed to consider the entire value of the Accept header, including any media type parameters (profiles, etc) in deciding which is the best representation of the resource. RFC7231 gives an analogous example of selecting among various kinds of plain text:

https://tools.ietf.org/html/rfc7231#page-39

I think it would be good practice for clients to specify both a profile and also "unprofiled" JSON-LD (if they can actually accept that), e.g. application/ld+json, application/ld+json;profile=foo. It' should not be necessary for a client to specify a lower q value for generic application/ld+json than for application/ld+json;profile=foo, because the server ought to prefer the "more specific" value anyway.

If a client sends Accept: application/ld+json;profile=foo without also application/ld+json, the server could either return a 406, or could return a 200 with its own "best guess" content type (as it would be entitled to do if the client had not sent an Accept header at all). In such a case, it would make sense, to me, to return JSON-LD rather than some other RDF media type.

ajs6f commented 5 years ago

Wow, I got that completely backwards! Thanks for the correction, @Conal-Tuohy.

Given that many clients will be expecting to use compacted JSON-LD as JSON, I think a 4xx is by far the best answer here. Non-compacted forms are going to be more surprising and troublesome than helpful to such clients.

gkellogg commented 5 years ago

Better than status 415 is status 406 "Not Acceptable":

The target resource does not have a current representation that would be acceptable to the user agent, according to the proactive negotiation header fields received in the request, and the server is unwilling to supply a default representation.

iherman commented 5 years ago

This issue was discussed in a meeting.

azaroth42 commented 5 years ago

I don't understand the resolution. The profile parameter can ONLY use:

A non-empty list of space-separated URIs

According to the IANA registration of application/ld+json

BigBlueHat commented 5 years ago

I don't understand the resolution. The profile parameter can ONLY use:

A non-empty list of space-separated URIs

According to the IANA registration of application/ld+json

Exactly. 😃 The original proposal was to use them as URL's--extracting them from the profile value and dereferencing them and using them when parsing the data document.

It was determined that client's requesting processing constraint documents (i.e. frames, contexts) was a potentially viable feature request, but that the profile media parameter was the wrong vehicle for that.

@gkellogg mentioned using Link instead and has added that approach to the work-in-progress PR https://github.com/w3c/json-ld-api/pull/56

There will certainly need to be discussion about the use of Link (per-process), but overall it feels more correct than using profile.

I'd propose we change the title of this issue (s/parameters/header) to match reality. That sound OK?

azaroth42 commented 5 years ago

I'm :-1: on the resolution as I understand it. It conflates descriptive metadata and protocol transactions unnecessarily by applying rules intended for responses to the functionality of requests.

Each entry in the Accept header is treated individually. It is perfectly reasonable to ask for json-ld or rdf/xml (syntax change) ; for json-ld or plain text (semantics are lost) ; or for two different flavors of json-ld. Those different flavors could be different contexts, which already imply an ontology selection according to that context's definition. If the context ever changes, then the semantics have changed, which according to the 100% strict reading, would not be acceptable and thus we would already be in an error state without introducing anything further usage.

I'm also :-1: on changing parameter to header. Link headers are not a functional way to request content negotiable resources, as they lack all of the prioritization of q values and can't be grouped together with the media type they apply to. For example, say I want schema.org as JSON-LD, or simple Dublin Core as XML, because those are the two representation formats I have implemented. If the profile is separated from the media type, it would not be possible to determine that schema goes with json-ld, and DC goes with XML. Ergo, the profile MUST be a parameter and not in a separate header. And thus we either accept that it can go in profile, or we introduce a new parameter that quacks an awful lot like profile.

BigBlueHat commented 5 years ago

@azaroth42 this has moved beyond content negotiation--which is I think where things are getting tangled and why it probably needs it's own issue(s).

First, using profile with the media type parameter is still (and will remain!) a viable way to content negotiate for a resource:

Accept: application/ld+json;profile="http://www.w3.org/ns/anno.jsonld"

and

Accept: application/ld+json, application/ld+json;profile="http://www.w3.org/ns/anno.jsonld"

still request that the server send back something that matches one of those (possibly profiled) types.

However, this feature request included dereferencing the requested profile URL and processing that to generate the content--which per a stack of RFCs--goes against the intended use of the profile parameter as simply bearing identifiers and leaving it up to the server to determine the proper response for the request (i.e. profile should not be used to create a processing instruction).

Consequently, Link (which is a viable header on both requests and responses) was proposed by @gkellogg as we were wrapping up the call with the promise of future exploration and follow-on discussion about that particular approach (and the feature in general).

There was not (and is not to my knowledge) any intention to replace or remove the use of profile, just to make sure it's not misused as a processing instruction.

Any clearer? 😄

gkellogg commented 5 years ago

@BigBlueHat is correct, the profile parameter remains, and is used with a registered URI. We define four such URIs, and certainly other specs can define their own profiles and register them.

After closer inspection of the RFCs it seemed that our use of the profile parameter to also reference the context or frame to use was outside of the usage pattern proscribed for such parameters. Since we already have the Link header, used for responses, using it for requests is reasonable.

However, your point about not being able to use a specific context URL as part of content negotiation is well taken, although it's not clear how a server would distinguish between different context or frame URLs to make a decision, without first dereferencing them, or using an internal registry.

I'm not sure how we could allow for content negotiation on a context URL given the RFCs available to us.

azaroth42 commented 5 years ago

However, this feature request included dereferencing the requested profile URL and processing that to generate the content

I don't see that anywhere in this issue, nor the referenced original issue. Can you provide a link?

BigBlueHat commented 5 years ago

I don't see that anywhere in this issue, nor the referenced original issue. Can you provide a link?

Hrm. It may have first come up at our TPAC conversations minuted above: https://github.com/w3c/json-ld-syntax/issues/8#issuecomment-433597582

Which is why this is marked as having security concerns. There's no security concern if these are just opaque identifiers.

azaroth42 commented 5 years ago

It came up as a concern, certainly. Don't do that then, if there is a concern :) Just use a whitelisted set of profiles. IOW, there is no issue here about URI vs URL that isn't already addressed by an expanded security considerations section.

gkellogg commented 5 years ago

From RFC 6906 3.0

Profiles are identified by URI. However, as is the case with, for example, XML namespace URIs, the URI in this case only serves as an identifier, meaning that the presence of a specific URI has to be sufficient for a client to assert that a resource representation conforms to a profile.

This implies that an opaque URL is not appropriate for use in the profile, as, unless specifically registered, it cannot be dereferenced to affect the representation of the resource; at least, that was my take away from our discussion. This implies that our resolution, which @BigBlueHat mentioned from TPAC violates the stated purpose of the profile parameter.

azaroth42 commented 5 years ago

The next sentence, however clarifies that:

profiles MAY be defined in a way that the URIs do identify retrievable profile description and thus can be accessed by clients by dereferencing the profile URI

So I continue to disagree that there's anything wrong with our usage.

BigBlueHat commented 5 years ago

For profiles intended for use in environments where clients may encounter unknown profile URIs, profile maintainers SHOULD consider to make the profile URI dereferencable and provide useful documentation at that URI.

@azaroth42 from what I can tell from the rest of the surrounding examples (mostly about podcasts), the intent is that at most a "dereferencable URI" (aka a URL) value for profile is meant to return documentation (for machines or humans). Most of the examples point to using human-friendly HTML documentation URLs as the identifier (however derefencable) of the "profile" of the format.

The use case of applying a profile seems a bit different than asking if the server has a resource conforming to a profile--hence the processing instruction comments from earlier.

BigBlueHat commented 5 years ago

As an example, if I send Accept: application/ld+json;profile="http://www.w3.org/ns/anno.jsonld" to a server, I would not expect the server to take whatever JSON(-LD) it had for the resource and attempt to apply that context URL to it.

However, it would be my expectation that the response would 406 if the only thing on disc were JSON(-LD) which did not conform to that profile (since I only asked for the profiled variant).

The distinction seems important from a server implementation perspective, and having a means to intentionally request or initiate such processing does seem useful...but seems best signaled by something less implicitly inert.

iherman commented 5 years ago

This issue was discussed in a meeting.

ajs6f commented 5 years ago

I'm coming in very late on this, but wrt a Link header, could we rely on previous work here? I understand there could be some issues with q-value ordering (or rather, the lack thereof).