opengeospatial / NamingAuthority

Primary repo for the OGC Naming Authority
6 stars 12 forks source link

Return actual definition in negotiated media type #109

Closed jerstlouis closed 1 year ago

jerstlouis commented 3 years ago

When text/html or a specific format is not requested in the Accept: header, the common definition of the resource (e.g. JSON, GML) should be returned.

In particular this is desired for registered TileMatrixSets, there the JSON object would be returned.

This would make it practical for clients, rather being hidden behind a "See also" link.

ghobona commented 3 years ago

@rob-metalinkage This is potentially another use case for Content Negotiation.

rob-metalinkage commented 3 years ago

There is a good deal of flexibility, the def server already supports Content-negotiation.

however the DefServer can't a) work with information it doesnt have b) work in different ways for the same type of case

so its up to the SWG to define exactly what it wants and then the NA to decide how the DefServer should behave, then then I can implement the desired behaviour.

In order to achieve this: a) the "common definition" would need to be provided and linked to (e.g. a spreadsheet with uri and resource URL) the resources in question, by the SWG b) the SWG would need to commit to a governance and maintenance regime to ensure these resources remain accessible c) the exact behavioural requirements should be specified d) if these dont fit into current DefServer behaviour then a discussion about options had and the results brought to the NA for discussion e) the agreed behaviour prototyped and tested, (we maintain versions of the def server to support this) f) the behaviour and supporting data deployed to live Def server

many SWGs deal with b) for things like XML schemas, but its less than clear for all cases where this stands, so its helpful to record it.

The behaviour of having a heterogeneous resource delivered if Accept: is not specified (or is / like browsers tend to do) means either: 1) adding special rules to specific paths for redirect or proxy of resources 2) adding an extra data-driven redirect layer for everything to handle special cases

the other piece of information available is the client. But really the correct mechanism is for clients to set the Accept: header if they care what form the resource is accessed in - is there really a strong enough reason not to do this? If there are common code libraries that are poorly behaved, can they be fixed before we hack behaviours on the server side?

If the issue is simple display of the canonical form - then I see no reason we can't add a special property to link to canonical representation and show that if present in a box in the HTML view.

I've added a new label "UseCaseNeeded" for these types of questions that need wider discussion before action.

jerstlouis commented 3 years ago

@rob-metalinkage Speaking for our OGC API - Tiles use case for the TileMatrixSets registry:

cportele commented 3 years ago

I think, we should not force the definition server (or any server) into supporting a "default representation" for resources. As RFC 7231 says: "A request without any Accept header field implies that the user agent will accept any media type in response."

In Features we had a discussion about this and decided to stick to the HTTP rules. I think the definition server is no different.

dr-shorthair commented 3 years ago

Agree with @cportele - adding local rules is a downward spiral

jerstlouis commented 3 years ago

@cportele OK, so this means that clients expecting a JSON representation must ensure to include an Accept: application/json header. But still, we could make the suggestion that the definition server returns this common encoding by default, as many OGC APIs do?

cportele commented 3 years ago

@jerstlouis - No OGC API standard (that I know) suggests that JSON should be a default encoding. And we should not create an expectation that servers will provide a certain default encoding. This is even more the case in the definition server where some SWG may prefer JSON as the "default" for their resources and others will prefer Turtle, etc.

rob-metalinkage commented 3 years ago

Suggestion - a few things we can try without creating too much special case logic in the server.. a) embedding the JSON in a default HTML response as something like a tag (https://developer.mozilla.org/en-US/docs/Web/HTML/Element/data) b) just making the canonical form visible if present in the UI and telling users how to access it properly using Accept: c) rel=canonical to the expected "default encoding"

jerstlouis commented 3 years ago

@rob-metalinkage The "default encoding" is not so important, as @cportele points out.

My suggestion was just to select such a default encoding based on the SWG's preferred encoding for their resources, without creating an expectation.

But the essence of this issue is that right now even with a request for the canonical URL, a 303 is being returned with no content instead of a 200 with the actual definition in whichever media type is negotiated. Any representations supporting links could include links to alternate representations in the response body, as well as in the response header. For the TileMatrixSets use case at least we would very much like the actual definition to be returned as a 200 in the body of the response.

ghobona commented 3 years ago

All, please remember that the Definitions Server is intended to enable discovery of definitions, management of the definitions and access to those definitions. We use SKOS because it facilitates all three of these functions. Once discovered, client applications can then request their domain-specific encodings.

jerstlouis commented 3 years ago

@ghobona @rob-metalinkage Does that invalidate the use of the canonical URI as the request URL for getting back the actual definition of a TileMatrixSet response as application/json (assuming an Accept: application/json header)?

I thought we had agreed that this would be possible as part of discussion relating to those URIs (e.g. https://github.com/opengeospatial/ogcapi-tiles/issues/47 ).

If it is not possible then we should certainly always require a tileMatrixSetDefinition as suggested at https://github.com/opengeospatial/2D-Tile-Matrix-Set/issues/34 , and if an implementation wants to link to the definition provided by the definition server, then it should link to the actual URL where the JSON definition of the TileMatrixSet is found (however odd that URL might look like).

However I would find that extremely impractical that the definition server cannot directly return the actual definition (personal opinion).

But it already does return an actual definition e.g. for CRS:

http://www.opengis.net/def/crs/OGC/0/UnixTime

So I don't believe this is the case? This returns a 200 with an XML definition, as expected!

ghobona commented 3 years ago

Actual definitions can be returned through Content Negotiation. @rob-metalinkage explained this earlier in this thread. See https://github.com/opengeospatial/NamingAuthority/issues/109#issuecomment-852718098

The approach of retrieving GML-encoded CRS definitions from the canonical URIs is legacy and will be changed eventually when resources permit. There is actually a request from another OGC member to return the CRS definitions in WKT CRS, which is also an OGC standard. This highlights the need for a default encodings to facilitate discovery, and Content Negotiation to enable access to other encodings.

jerstlouis commented 3 years ago

@ghobona yes I fully agree Content Negotiation to enable access to other encodings, and a preferred encoding is all great. A JSON encoding of CRS (CRS SWG now looking at PROJJSON as a starting point for that) would be great as well.

However I am confused by: A) Right now, the definition does NOT return the definition for TMS (e.g. http://www.opengis.net/def/tilematrixset/OGC/1.0/WebMercatorQuad returns a 303, not a 200) B) @rob-metalinkage 's reply here https://github.com/opengeospatial/NamingAuthority/issues/112#issuecomment-852838287 saying "returning 200 not an option" . Maybe he meant if a non-canonical URI is used? However my example in that case was a canonical URI (like the WebMercatorQuad above)

So maybe the issue is just that returning the definition is not yet implemented for TileMatrixSets, and Content Negotiation to support multiple types is a feature to be implemented.

rob-metalinkage commented 3 years ago

so the task is to work out how to get from:

http://www.opengis.net/def/tilematrixset/OGC/1.0/WebMercatorQuad

to

http://schemas.opengis.net/tms/1.0/xml/examples/WebMercatorQuad.xml

when XML is asked for

I think I can write a special view that can be used for all cases - that inspects the data and generates a response as a proxy (200 response - after the 303 to the script) - thats why 200 is not an option - it breaks HTTP protocol architecture to deny using redirects appropriately

jerstlouis commented 3 years ago

@rob-metalinkage In my opinion, the real resource for the TileMatrixSet definition is its canonical URI, and it may support multiple representations (using content negotiation). Therefore returning a 303 for http://www.opengis.net/def/tilematrixset/OGC/1.0/WebMercatorQuad is not appropriate, and would be greatly confusing as implementors will easily mix up URIs with the example URLs, since whenever they browse to the URI they will be redirected there.

Just like http://www.opengis.net/def/crs/EPSG/0/4326 is the real resource for the EPSG:4326 definition.

In my opinion an examples/ folder should not be authoritative, while the canonical URI from the definition server is.

If the definition server uses a script / proxy internally to populate the response from a directory containing definitions in multiple encodings somewhere that is great, but that should not be visible to the client. It should also be possible to add new definitions without updating the (TileMatrixSet 1.0/2.0) standard itself.

dr-shorthair commented 3 years ago

AFAIK the issue here is that http://www.opengis.net/def/ is a redirection service. Redirection is a ubiquitous pattern for PID hosting. And HTTP redirection is managed through 3** codes. So I don't think it is feasible to insist that every request for a valid http://www.opengis.net/def/ will generate a 200.

jerstlouis commented 3 years ago

@dr-shorthair Is it a redirection service or a definition service? I personally believe the latter is much more valuable / useful. If the official definition is managed by this registry, where else is it re-directing to? http://www.opengis.net/def/crs/EPSG/0/4326 does not redirect, it returns a definition with a 200 HTTP code.

For the TileMatrixSets register, it is a definition service that we need, just like I believe is also the case for CRSes.

rob-metalinkage commented 3 years ago

you are correct that http://www.opengis.net/def/crs/EPSG/0/4326 does not redirect. That's a bug in the legacy system before the definitions server gets involved - it should be redirecting not proxying remote content. Thats a bug because it means we can never offer anything but the particular representation, and there is already a plan to improve the CRS definitions offering with more flexible representation options.

"definition" does not imply "only canonical form of definition" - the design is to use the standard HTTP protocol to support delivering definitions in the multiple forms potentially needed. We do need to do a lot more work to flesh out availability of canonical forms in particular.

for example, for the definition of a conceptual data model element there are many possible valid forms

see https://w3id.org/demeter/agri/agriIntervention?_profile=alt&_mediatype=text/html

the "best form" is likely to change over time for many things - so explicit options with redirects is the one pattern that will keep on working.

jerstlouis commented 3 years ago

@rob-metalinkage

Thats a bug because it means we can never offer anything but the particular representation, and there is already a plan to improve the CRS definitions offering with more flexible representation options.

I don't understand this. Content negotiation with the Accept: header supports multiple representations and that is heavily used in OGC APIs and works perfectly.

e.g. the following cURL commands both return a 200 definition of the WebMercatorQuad TMS, in JSON or HTML representations:

curl -i -H "Accept: application/json" https://maps.ecere.com/ogcapi/tileMatrixSets/WebMercatorQuad

curl -i -H "Accept: text/html" https://maps.ecere.com/ogcapi/tileMatrixSets/WebMercatorQuad

and we could add support for XML as well.

Re-direction is not required to return multiple representation of a definition, and in my opinion re-direction severely hampers the definitions server's usefulness.

explicit options with redirects is the one pattern that will keep on working.

If an explicit media type is requested, that should keep on working as well.

@joanma747 please share your opinion on this topic.

rob-metalinkage commented 3 years ago

Forcing 200 as the only possible response has the following drawbacks, any of which is sufficient to rethink that: 1) its an arbitrary limitation on the use of the HTTP protocol which supports redirections already, 2) it only allows one information model per encoding - whereas the general case is that encodings are not hard to translate, but control over and flexibility for the information model is far more important 3) It introduces a set of security and access control concerns over proxying content vs. leaving the resource provider to make the call 4) it would involve major redevelopment of current capabilities 5) it would be an arbitrary change to current capabilities motivated by a single perspectivre 6) we dont have an articulation of why a immediate 200 response is necessary, when redirects are used widely elsewhere. its not clear that the this proposal isnt a "expected solution" rather than an actual problem statement.

I'd start with 6 if you want to develop a change proposal.

jerstlouis commented 3 years ago

About 2, I am hoping negotiation by profile (https://github.com/opengeospatial/ogcapi-common/issues/8) would support multiple information models per encoding.

About 3, in the many contexts where the resource provider is the OGC Naming Authority itself, does that still apply?

About 6, I will try to better re-phrase my concerns as actual problem statements:

Thank you for considering these suggestions.

rob-metalinkage commented 3 years ago

Thanks Jerome.

2 - yes - content negotiation by profile does apply there to provide a canonical mechanism for discovery and reporting what forms are available

3 - We already have externally hosted content and plans to interlink across other definition domains (ISO) etc., so its important not to assume proxying is safe. This may be relaxed in future but at the moment that landscape is still emerging and its hard to say.

the other points come down to the actual requirements: a) issue #95 is specifically about options to make citation easier. b) cache-control headers should be used and clients should cache if accessing canonical definitions is a performance concern - again lets not try to re-invent the Web and Internet c) what is returned is the issue - there are two parts to this answer:

Apropos the latter (this is getting to be a long logic chain I realise) the data currently available at

https://github.com/opengeospatial/NamingAuthority/blob/master/definitions/conceptschemes/tms.ttl

does have a set of rdfs:seeAlso links to XML

If the SWG wants to ask for an update to this, then happy to help. Do you want me to generate a straw man for an improved representation of that linkage and a for how we can redirect to the XML automatically if XML is requested? I would be willing to bring that suggestion to the NamingAuthority for sign-off and implement if everyone is happy.

jerstlouis commented 3 years ago

Thank you @rob-metalinkage

This will give you TTL and JSON options to get the "actual definition".

The JSON definition that is really needed is an actual TileMatrixSet definition, corresponding to the XML one, like e.g. http://schemas.opengis.net/tms/1.0/json/examples/WebMercatorQuad.json .

The TTL does not contain an actual TMS definition, and I don't know if you meant a JSON equivalent of that TTL there which just provides links.

Hopefully this addresses the problem, which is the content being redirected to not the 303 itself I think ?

It is correct that the bigger problem is that it redirects to Turtle rather than the actual JSON or XML definition. If the 303 would be to the JSON or XML based on the negotiated media type, that would be a big step ahead.

I assumed those URIs to be the actual authoritative resources for those TileMatrixSets. Maybe the SWG needs to set up another online directory not attached to a particular version of the standard (and not identified as examples), which would be the authoritative home of the TileMatrixSets register. But then couldn't we just bypass the redirecting definition server and use that directly as the URIs for TileMatrixSets (assuming it also supported content negotiation)?

the SWG can provide a mapping for particulat paths to the location of canonical XML resources and we can set up redirects.

With OGC API - Tiles, the JSON representation is becoming a lot more important than the XML, but we still have both XML & JSON encodings. The latest versions of both encodings for the defined TileMatrixSets are at:

https://github.com/opengeospatial/2D-Tile-Matrix-Set/tree/master/schemas/tms/2.0/json/examples/tilematrixset

https://github.com/opengeospatial/2D-Tile-Matrix-Set/tree/master/schemas/tms/2.0/xml/examples/tilematrixset

But the UTM one (UTM31WGS84Quad.xml) is an example from the UTM family for Zone 31, and should be expanded to one for each of the 60 UTM zones (that might require some considerable effort from the SWG). So there are currently 69 registered TileMatrixSets. (I am arguing that WGS1984Quad.json and WorldCRS84Quad.json are equivalent and only the latter should be registered).

The encodings for 2DTMS 2.0 standard are not yet completely finalized, so it might be best to wait a few more weeks before doing any updates.

If the definition server is going to simply redirect somewhere as opposed to host the content directly, we probably need a better place than the versioned standard schema examples as the authoritative location of registered tile matrix sets. Something that does not depend on releasing a new version of the standard to register new tile matrix sets. Where should that be and how should that be managed? A directory on the GitHub repo? Somewhere on opengis.net (e.g. https://schemas.opengis.net/tms/registry ?) Where will CRS be redirected to if the plan is to make them redirect as well?

Thank you!

joanma747 commented 3 years ago

What is fundamental for us is that we are able to get the description of a TileMatrixSet form the NA directly in a single URL. So we do not want to get a RDF representation of the concept. We would like to get the actual definition of the TileMatrixSet directly: exactly what we get from http://schemas.opengis.net/tms/1.0/xml/examples/WebMercatorQuad.xml

Is that possible now?

If we have to add a parameter to the URL that should not a problem. Example: http://www.opengis.net/def/tilematrixset/OGC/1.0/WebMercatorQuad?query=SameAs[2]

rob-metalinkage commented 3 years ago

OK - I will experiment with setting up an example redirection rule.

Looking at the ttl file, i see some only have JSON, some only have XML. It would be simpler if they all had both JSON and XML. I see they both seem to be present at http://schemas.opengis.net/tms/1.0/json/examples/ - so is it ok to offer both?

I think it would be ok to override a default JSON view of the RDF content (SKOS) with the specific JSON as the JSON encoding of the SKOS profile is always available. (e.g. http://www.opengis.net/def/status/valid?_profile=concept&_mediatype=application/json)

This is the first case for this so probably needs a policy decision to do this. Regardless, we would need to specify a URI for a formal profile for the TMS info model it would also be available as... that would be

http://www.opengis.net/spec/tilematrixset/1.0/conf/xml-tilematrixset2d

(and I would generate a formal RDF profile definition for this - we are currently looking at specs to do this systematically but didnt expect this Use Case to be so immediately needed - lets prioritise TMS :-) )

if there is a great deal of variance and its something that will need updating I'll need to explore either: a) giving the SWG the format to define custom redirection rules ( i can load these via a JSON format - they are a little complex though as we need to define rules, descriptive profiles as well as bindings to paths) or b) regenerating a redirection rule programmatically

rob-metalinkage commented 3 years ago

Actually I'll need a little bit more input on how the various conformance classes map to the various examples please.. i see there are a few different options..

rob-metalinkage commented 3 years ago

OK folks - I've implemented a little experiment

On defs-dev I've set up custom rules..

http://defs-dev.opengis.net/def/tilematrixset/OGC/1.0/WebMercatorQuad?_profile=alt

that explicitly list options

but I've also set up a default option if no profile is specified and json or xml is requested.

curl -i -H "Accept:application/json" http://defs-dev.opengis.net/def/tilematrixset/OGC/1.0/WebMercatorQuad

I think I should probably make the alts view provide some more detail about availability of default profiles for different encodings - thats a bit of trickier back-end coding - if it this is to be a supported policy I'll add it to the development backlog.

rob-metalinkage commented 3 years ago

Of course - I havent yet worked out how to handle for different versions too - but should be possible with some regex cleverness and another set of rules.

rob-metalinkage commented 3 years ago

Finally - if you are happy with this option would you be prepared to bring it to the NamingAuthority as a proposal (and provide some feedback about the ability and willingness to support emerging needs :-) )

joanma747 commented 3 years ago

Looking at the ttl file, i see some only have JSON, some only have XML. It would be simpler if they all had both JSON and XML.

This is a "accident". All will be available in JSON and XML so it will be symetrical.

curl -i -H "Accept:application/json" http://defs-dev.opengis.net/def/tilematrixset/OGC/1.0/WebMercatorQuad

Works for me!. Thanks!

jerstlouis commented 3 years ago

@joanma747 If it is decided that the definition server can only do 303 redirection, where should it redirect to?

An examples/ folder associated with a particular version of the standard does not feel right since those would be registered tile matrix sets and not examples, and it prevents from registering additional tile matrix sets in between versions. Also, we should probably expand the UTM family to all 60 TMS at some point.

jerstlouis commented 3 years ago

@rob-metalinkage

Actually I'll need a little bit more input on how the various conformance classes map to the various examples please.. i see there are a few different options..

For the (not yet published) latest version:

There are two conceptual conformance classes:

http://www.opengis.net/spec/tms/2.0/req/tilematrixset2d -- which is the basic TileMatrixSet http://www.opengis.net/spec/tms/2.0/req/variablematrixwidth -- which adds support for variable widths

In the examples, only two Tile Matrix Sets use the variable width capability:

Then there are corresponding per-encoding conformance classes:

There are currently 11 examples on the GitHub repo, but as I mentioned the UTM one should eventually be expanded into 60 separate TileMatrixSets, and the WorldCRS84Quad and WGS1984Quad should probably both share the WorldCRS84Quad URI. All should have both JSON and XML encodings.

These are the schemas for the XML and JSON encodings (including support for variable widths)

Also if possible, we would prefer the JSON representation to be selected by default. XML is dead, long live JSON ;)

rob-metalinkage commented 3 years ago

I've also doing some work to make sure that ConceptSchemes offer the full entailed SKOS versions as TTL and RDF now (they were minimal metadata for the ConceptScheme root object before - now it is the full contents. )

list at https://www.opengis.net/def/status?_profile=alt and example for TTL at https://www.opengis.net/def/status?_profile=conceptscheme&_mediatype=text/turtle

(still yet to work out how to distinguish longer paths from terms without having to access the data at runtime to do the redirect)

ghobona commented 3 years ago

The PR https://github.com/opengeospatial/NamingAuthority/pull/127 adds a section describing the role of Content Negotiation.

joanma747 commented 3 years ago

This is our suggestion for offering the TMS "files" to the NA to harvest:

https://github.com/opengeospatial/2D-Tile-Matrix-Set/issues/37

jerstlouis commented 2 years ago

The Tiles SWG has set up a TileMatrixSet registry holding all definitions of registered TileMatrixSet at https://github.com/opengeospatial/2D-Tile-Matrix-Set/tree/master/registry/, including both JSON and XML encodings, with the intent that the OGC definition server could return the content of those definitions directly based on content negotiation.

Note that these follow the logical model and encodings defined in 2DTMS 2.0. The TileMatrixSet identifiers for these however should still maintain the http://www.opengis.net/def/tilematrixset/OGC/1.0/ (1.0) identifier part, as the conceptual definition for these tile matrix sets themselves did not change since 2DTMS 1.0 (and should never need to change). Unversioned identifiers could also be used (as discussed in #143), and would probably be preferable.

jerstlouis commented 1 year ago

Re-iterating the use case:

Re-direction for the TileMatrixSet registry is problematic for several reasons:

rob-metalinkage commented 1 year ago

the citation behaviour is something to consider. the UI does provide a too-subtle link to the canonical URI - we will be building a citation support widget to replace this.

the other reasons dont seem to add up however:

if there is nowhere to redirect to, there is no content to proxy either curl is not a "client" and supports redirection with the right options

rob-metalinkage commented 1 year ago

Content disposition requirements addressed in #203. Leave this issue for any further discussion about redirection behaviours - but closing it for now as the content access needs to be addressed instead.

jerstlouis commented 1 year ago

@rob-metalinkage

if there is nowhere to redirect to, there is no content to proxy either

Not necessarily, if the idea is that the OGC definition server hosts the content directly in its database, while groups such as the Tiles SWG can provide that content to load (through the GitHub repository TileMatrixSet registry). Is that what #203 will be about?

curl is not a "client"

I beg to differ :) The c in cURL stands for Client URL. It has been called a client and is used in countless Testbed client demonstrations :)

and supports redirection with the right options

Yes, but I am trying to highlight the fact that re-direction introduces extra pain (e.g., figuring out that -L is the command-line switch to follow redirections), when what you would really like is to get the actual definition in the first place (especially considering that the URL you are fetching from is the canonical URL for that definition).

the UI does provide a too-subtle link to the canonical URI we will be building a citation support widget to replace this.

It is not only about the subtlety / visibility of the canonical URI, but about a very strong expectation from users of the definition server that the only valid reason for the URI pasted in the browser address bar to change is to correct it to the canonical one.