Specify the realisation order of precedence for conflicting profile negotiation situations

nicholascar commented 6 years ago

It is clearly specified that QSA has precedence over HTTP if both are supported, however there is no proposed mechanism for specifying the relative precedence of other URL based implementations. Can this be left to implementations to specify or must the situation be handled in this specification?

larsgsvensson commented 6 years ago

What do you suggest?

nicholascar commented 6 years ago

I suggest a principle and then we have to work out the details. The principle is: the more deliberately implemented realisation should win.

So, QSA over HTTP (as QSA is a fixed thing, HTTP could be auto-set by a server).

I can’t yet decide about QSA v. REST. In the REST test I’m now doing, there are more fixed ways of specifying a profile but no option for preferences, unlike HTTP & QSA. So perhaps REST over QSA.

I will add test examples here to debate.

larsgsvensson commented 6 years ago

Can you elaborate a bit on what you mean by REST in this case? To me QSA is a REST implementation (you simply ask for a specific resource and get a representation of the current state of that resource).

nicholascar commented 6 years ago

Sure: there's the possibility to use RESTful resource URIs without QSAs, for example you could implement "list profiles" for /a/resource with /a/resource/profile/ etc. as opposed to the QSA /a/resource?_profile=list and perhaps implement "get resource by profile" with /a/resource/profile/TOKEN.

So a simple pattern of RESOURCE_URI /profile/ PROFILE_TOKEN and perhaps RESOURCE_URI /profile/ PROFILE_TOKEN /mediatype/ MEDIATYPE_TOKEN if: a) MEDIATYPE_TOKENs don't generate invalid URIs and b) other conneg dimensions such as language can be catered for too, perhaps `RESOURCE_URI /profile/ PROFILE_TOKEN /mediatype/ MEDIATYPE_TOKEN /(language|lang)/ LANGUAGE_TOKEN.

The main thing to sort out with this URI-only approach is to order the /dimension/dimension_token pairs. Should it be:

RESOURCE_URI / [PROFILE_PAIR] / [MEDIATYPE_PAIR] / [LANGUAGE_PAIR] / [X_PAIR] ...

or perhaps another order? The order should perhaps follow processing order so that would imply:

RESOURCE_URI / [MEDIATYPE_PAIR] / [PROFILE_PAIR] ...

But I don't know enough about language, time (I think Memento says it's first?) etc.

Obviously this purely hierarchical approach is limited compared to HTTP & QSA as it's hard to see sensible URIs doing the equivalent to:

?_profile=xxx&_mediatype=text/turtle,application/ld+json,application/rdf+xml

But loads of sites use, for example, /en/ for English representations of things so I think it's sensible to cater for this.

agreiner commented 6 years ago

I really don't think we should be specifying this stuff. This is system architecture that should be left to system designers.

larsgsvensson commented 6 years ago

@agreiner scripsit:

I really don't think we should be specifying this stuff. This is system architecture that should be left to system designers.

I see your point, My gut feeling is that usually a very specific URL trumps anything that HTTP headers say (so if you ask for http://example.com/a/resource.html.en but set the headers to prefer text/turtle and French, you would get html and English anyway).

Can you think of other cases?

agreiner commented 6 years ago

I think that you get data-prof1.csv if you request /data/prof-1.csv or /data.csv?profile=prof1 or whatever, unless the owner happened to set up content negotiation for /data/prof-1.csv or data.csv. Why anyone would set that up is beyond me, and seems such an unusual edge case that I think it should just follow the rules of currently available content negotiation. I see no reason to request a departure from that standard.

nicholascar commented 6 years ago

@agreiner I don't understand "departure from the standard". Currently there is no standard for conflicting profile negotiation situations since there never has been the ability to request things on a profile basis.

If we look to Media Type as an indication of what profile handing could look like, then, as Lars said, blah.csv with Accept: text/turtle should probably return CSV, since a deliberate human choice has been mad for CSV, as opposed to the less visible, machine-set, text/turtle.

The triple conflict would be:

/a/resource.csv?_mediatype=text/turtle Accept: text/xml. Here, I suggest,.csv should win over Accept: text/xml but _mediatype should win over .csv (a Query String Arg being even more deliberately set than a file extension-like URI).

agreiner commented 5 years ago

@nicholascar I meant that we should follow the approach used in content negotiation by media type when someone requests a resource with a specific media type in the URI while having accept headers in place. Of course, that would only come into play when the site admin has set up alternative representations for the URI in question, which strikes me as a very odd thing to do.

nicholascar commented 5 years ago

Normative guidance on servers' handling of HTTP header conflicts are given in Section Hypertext Transfer Protocol Headers.

No guidance is yet given in either the HTTP or QSA Functions Profiles' sections to indicate how to resolve QSA + HTTP conflicting instructions. General guidance must be given as to how to resolve Functional Profile clashes. I suspect the server can just tell you which FP it is adhering to in it's response and then this will be a general solution.

rob-metalinkage commented 5 years ago

I think QSA should override Headers - clients tend to set Headers as defaults - but human agency to specify a profile explicitly via QSA in either data references or as choice in UI should override software defaults.

nicholascar commented 5 years ago

@rob-metalinkage can you suggest a specific addition to the document?

larsgsvensson commented 5 years ago

@rob-metalinkage scripsit:

I think QSA should override Headers - clients tend to set Headers as defaults - but human agency to specify a profile explicitly via QSA in either data references or as choice in UI should override software defaults.

I certainly support this view and would go even further to say that when any request URI (not only "human agency") trumps the HTTP headers sent with that request. So if I ask for http://example.org/foo.html and my headers say Accept: text/turtle I still expect to get the content of foo.html and not of foo.ttl. The only other option for the server would be to answer with 406 saying that the media type of foo.html isn't compatible with the Accept-header I sent, but that seems overly restrictive to me.

agreiner commented 5 years ago

I'm not convinced that it's possible to assume a real conflict. Negotiation should only happen when a web server is configured to supply different versions of a resource depending on the header content. If someone wishes to configure their web server to differentiate multiple possible responses to a URL that happens to have a query string in it, why not let them do that?

rob-metalinkage commented 5 years ago

it only applies if the query string has a query parameter with the specific semantics of specifying profiles the client wishes. (if the server chooses to do - or already implements an API that does - QSA ). any other URL elements will get ignored.

nicholascar commented 5 years ago

In pyLDAPI, the order is indeed (most important first):

QSA > HTTP

For Conneg (by Media Type) implemented by the australian Government Linked Data WG, we have:

QSA > file-like endings > HTTP

due to the assumption that if someone specifies _format=text/turtle, it should override .xml which should override Accept: text/html.

rob-metalinkage commented 5 years ago

Seeing no disagreement over primacy of QSA over HTTP headers, decision is ACTION-373 - Provide text for order of precedence (#505)

agreiner commented 5 years ago

I think this deserves some more thought. If we expect web server software to deal with conneg by AP in headers, it would fall to its makers to enable it to determine that a given request uses query strings and override any configuration that would allow for content negotiation on that URL. But would we want web server software to inspect every query string and determine that? What would be the consequences for performance? Is there precedent for the server to override configuration based on the specifics of a request like that?

nicholascar commented 5 years ago

Is there precedent for the server to override configuration based on the specifics of a request like that?

If you ask for a response from an OAI-PMH server, you will get it in XML, regardless of any Accept headers you put in. So yes.

There must be lots of APIs that ignore HTTP Accept (and Accept-Language) etc.

rob-metalinkage commented 5 years ago

And of course we only provide a mechanism for servers that do behave this way - there is nothing to force all servers to do things this way - they can just implement HTTP profile of conneg-by-ap using URIs if they like - but at least this spec now provides a way to find out what they do actually support.

agreiner commented 5 years ago

Yes, I think most APIs ignore content negotiation, but the question is what to do when there is server configuration for http-header-based conneg and a web application that accepts query strings for the same resources. As the server, I'm thinking of web server software like Apache or nginx, not the web application. These handle content negotiation by directives in the configuration file. There would only ever be a conflict if someone creates a directive to use the headers for content negotiation and also accepts a query string for requesting data by profile. That seems obviously a developer error, and it would be most helpful if the web server handled it in a way that allowed the developer to guess what was going on, following the principle of least surprise. If a user sends a request with profile=foo that isn't used by an API (a user error in sending the request), but there is a directive to negotiate on headers, I would expect the server to do http-header content negotiation. In this case, there is no logic anywhere to handle the profile=foo. If a user sends a request with a profile=foo that is indeed used by an API, and there is http-header-based negotiation configured for the resource (almost certainly a misconfiguration), it would make sense from the user point of view for the server to ignore the header info and use the query string, but the server cannot know whether the query string is actually used or not until it has passed the request on to the API. It just passes the parameters on and lets the web application do the parsing. It seems unprecedented for a general-purpose web server like apache to parse query strings and behave differently depending on whether a specific key exists. And even if it did, it would have no logic for what to do with profile=foo, because that is specific to the web application.

rob-metalinkage commented 5 years ago

Thats a good analysis of an aspect of this issue - I would say that it would be appropriate for the server to respect the query string - and return a response saying that profile is not found.

in general I think you are correct if you are saying it would be a misconfiguration to attempt to support negotiation in the http server layer if the application wished to support QSA - the server should be configured to pass those headers to the application - thats certainly the way my implementation will operate.

Do you think this needs to be explicitly stated though? It may be worthwhile stating that normal HTTP error responses should be used according to the reason a profile may not be available (including 404 but also access denied etc) - and if really necessary citing this as an example in the precedence section?

larsgsvensson commented 5 years ago

I think the operative word here is "Web application". Admitting that I've only once confitgured an Apache server to use content negotiation (several years ago), my understanding at that time was that that configuration -- serving one specific path -- was only used for serving content from the file system (i. e. foo.html.en vs foo.html.fr vs foo.ttl). In all other cases the Apache worked as a reverse proxy forwarding the request to the correct web application. Those web applications had full freedom to handle any combination of request URI (which would include QSA) and http headers. So my take would be that if Apache or nginx implement conneg-by-ap, they will only implement http header negotiation and not care about QSA. Given that Apache doesn't implement e. g. Memento, I'm not sure they will implement conneg-by-ap either...

nicholascar commented 5 years ago

I think @larsgsvensson is correct with the statement about what Apache etc. could implement.

I've configured many Apache servers for conneg, both for local file systems (conneg to serve HTML or RDF from different files, for example) and also as proxies to applications written in Python or applications on whole other servers. In those cases, Apache is just acting as a pass-through splitting requests, usually, on some path segment. The pass-through must convey all relevant Headers & QSAa to the application which then handles the response.

I do use Apache to funnel multiple possible ways of requesting a resource using conneg into a single way that an application understands though in on server's case. e.g.:

RewriteCond %{QUERY_STRING} ^_format=text/turtle$ [OR]
RewriteCond %{HTTP:Accept} text/turtle [NC]
RewriteRule ^/dataset/asgs2011/(.*)/$                   http://asgsld.net/2011/$1/?_format=text/turtle [R=302,L]
RewriteRule ^/dataset/asgs2011/(.*)/index.ttl$          http://asgsld.net/2011/$1/?_format=text/turtle [R=302,L]
RewriteRule ^/dataset/asgs2011/(.*)/$                   http://asgsld.net/2011/$1/ [R=302,L]

So here you can request Turtle by either using the QSA _format=text/turtle or Accept: text/turtle or .ttl at the end of the resource. All result in a call to the Application of ?_format=text/turtle since the application can only handle QSAs.

So logic similar to this in Apache could be used by developers for Conneg by P.

nicholascar commented 5 years ago

The discussion here is now about implementation, not specification. I'm satisfied that the original issue has been dealt with by addition to text to ED Section 6.5 Order of Precedence for Implementation Profiles.

agreiner commented 5 years ago

I think it's okay to go with the QSA approach over HTTP headers, since handling query strings would have to be done by a separate web app, so the web server would know to hand it off without parsing anyway, and returning a 404 for a client request that assumes the wrong approach seems reasonable. Since we are allowing for alternative realizations, how does the order of precedence work with as-yet-unspecified ones? Maybe the only real conflict is between HTTP headers and web apps. When you say that "a client MAY specify conflicting choices via different mechanisms", I think you don't mean the RFC2119 version of "may" here. Maybe "In the case where a client requests one profile via one mechanism and another profile via a different mechanism, the order of precedence . . ."

nicholascar commented 5 years ago

I don't think the use of MAY or the alternative suggested makes any difference!

I think the sentence:

If other functional profiles of the abstract model are defined each MUST specify order of precedence.

Is problematic since we can't force future Functional Profiles to respect other FP's claims to order of precedence. Also, there is no canonical mechanism for order of precedence so we can't test for MUST.

After discussion in the subgroup, we feel that more discussion is needed about this issue so we will list it in the 3PWD.

rob-metalinkage commented 4 years ago

Have tweaked wording according to @agreiner point about MAY.. removed MUST and added link to this issue to help solicit feedback to check this.

w3c / dx-connegp

Specify the realisation order of precedence for conflicting profile negotiation situations #2