opengeospatial / ogcapi-common

OGC API - Common provides those elements shared by most or all of the OGC API standards to ensure consistency across the family.
https://ogcapi.ogc.org/common
Other
45 stars 14 forks source link

Slashes (and other characters) in names/ids #34

Closed hylkevds closed 3 years ago

hylkevds commented 5 years ago

Ontology people like to use URLs as identifiers, but I can't find anything specifying how to encode identifiers in URLs, or what the allowed characters are. I'm guessing the GET /collections/{collectionId}/items/{featureId} pattern is going to be confused if I make a collection named collections/collections/items/items and an item named items/items. This would make: GET /collections/collections/collections/items/items/items/items/items This is a silly and extreme example, but it illustrates the problem well.

cmheazel commented 5 years ago

@hylkevds Take a look at RFC 6570. URI Templates are widely used and are the path syntax we use in the spec. If your identifier is not legal under RFC 6570, then we cannot use it. If your identifier is legal but looks like a hack, then we probably shouldn't use it. Finally, keep in mind that the URL will be stored in a buffer. Buffers are not of infinite length. If our URLs grow too long we will start having problems with truncation and buffer overflow. Problems easily avoided.

hylkevds commented 5 years ago

RFC 6570 will happily let you expand templates into URLs that are ambiguous, or will be rejected by many web servers. So pointing to RFC 6570 is not enough to ensure valid URLs. Limits on what "we" consider valid identifiers should be made explicit.

m-mohr commented 4 years ago

What was the resolution on this? It is marked as OBE, but can't find anything documented here.

ghobona commented 4 years ago

2020-08-24 SWG Telecon.

The issue was discussed previously in the OGC-NA.

@ghobona To capture the OGC-NA decision and close this issue.

ghobona commented 4 years ago

OGC-NA addressed the issue in https://github.com/opengeospatial/NamingAuthority/issues/55#issuecomment-643399279

Any restrictions on identifiers or paths to resources should be designed and applied by the individual OGC API standards.

m-mohr commented 4 years ago

@ghobona Well, so then it seems that closing this is not correct. Because OGC NA didn't decide anything on this issue and it needs to be solved for Commons Part 2 (Collections). I guess the answer is / should not be used, but we have existing dataset IDs with slashes. How to expose them via OGC APIs? I don't think changing IDs is an option.

ghobona commented 4 years ago

@m-mohr I see what you mean. Although the issue is out of scope for the OGC-NA, a technical solution still has not been identified.

I agree, the issue should remain open.

cportele commented 4 years ago

Can someone clarify what the open issue is? Slashes can be used, they have to be encoded as %2f. That is, if one really wants to use slashes in an id, too, and not just in the name/title.

ghobona commented 4 years ago

@m-mohr I think the approach suggest by @cportele in https://github.com/opengeospatial/oapi_common/issues/34#issuecomment-679969763 solves the issue. Could you please confirm?

m-mohr commented 4 years ago

@ghobona Indeed, I thought somehow using %2F would be discouraged in the HTTP standard like it is with dots for example, but it's not. On the other hand, it seems there's no mention about ID encoding in the standard, which I guess would clarify this. Another issue might be that OpenAPI doesn't allow slashes in path parameters (see https://github.com/OAI/OpenAPI-Specification/issues/892 ) and I'm not exactly sure it's solved by percent encoding. So basically that's what is confusing me and thus I asked for clarification.

joanma747 commented 4 years ago

Having a "/" in an identifier of something that can end up in a path parameters seems a terrible idea

BUT

Amazingly it seems that %2F works in browsers.

For example http://www.creaf.cat/research goes to a web page of my institution http://www.creaf.cat/research/mediterranean-basin goes to a web page of my institution AND http://www.creaf.cat%2Fresearch goes to Google as a search term in chrome, edge and firefox http://www.creaf.cat/research%2Fmediterranean-basin returns a 404.

So it seems that @cportele is right and the escape works in practice and it is not interpreted as a /.

In my particular test cgi application in IIS what happens starts worrying me a bit. http://joanma.uab.es/cgi-bin/cgi_temp.cgi?kk%2Fkk

As you can see I get the query part as as the first argument of the application as "kk/kk" (transformed) and not transformed in the environment variable: QUERY_STRING=kk%2Fkk


Then we have another question: What a slash in an id means? Imagine a collection called "day/night" and a collection called "transport/roads". In the first case is a character in an id. In the second seems a hierarchical collection name. Do we have to scape the first but not the second?.


cportele commented 4 years ago

@m-mohr Strictly, ID or URI encoding in general does not have to be discussed in the standard as the rules from the normatively referenced RFCs apply, but an informative mention might help (which is why we added a sentence in Features).

The OAS issue that you cite is about something else (supporting path parameters that are not just a single path segment). Nothing in OAS prohibits the use of slashes in parameter values AFAIK.

ghobona commented 3 years ago

2020-12-02 OGC-NA meeting today did not reach agreement on this issue being out of scope of the OGC-NA. As a result, the next step will be to establish the rules for slashes and other characters in names/ids.

bradh commented 3 years ago

@ghobona Not sure I follow. Did not reach agreement? Or decided it was out-of-scope?

Who owns the next step? Is it on OGC-NA or on the SWG?

ghobona commented 3 years ago

The proposal was to rule that the issue is out of scope. We could not reach agreement on the proposal. Therefore, the next step is for the OGC-NA to propose the rules for slashes and other characters in names/ids. So the OGC-NA owns the next step.

jerstlouis commented 3 years ago

It seems to me that since IDs cannot contain slashes, URL-encoded slashes as part of the path component would be very confusing (path separators vs. path character inside an ID) and a recommendation should discourage them.

I would be very interested in clarifying the rules around the use of :, and exactly where/when they need to be URL-encoded, as it is what we currently proposed for hierarchical collections (https://github.com/opengeospatial/ogcapi-common/issues/11#issuecomment-677947387).

cmheazel commented 3 years ago

January 25 SWG: This is a useful discussion but for the most part is not normative. Move this discussion to the Users Guide and reference that section from the Standard. Change label to guide after link is added to common. Add this note to the link: "Note: the id can be anything but the resulting URI must conform to the RFC requirements for encoding."

cmheazel commented 3 years ago

Added note to section 6.2 "OGC Web API standards may include a community-defined identifier as part of a URI (ex. image id or feature id). Definition of the format of those identifiers is out of scope for these standards. Implementors should take care that these identifiers are properly encoded (see RFC 3986) in the URIs for all hosted resources."

The link to the Users Guide was already in place.

cmheazel commented 3 years ago

Feb 1 - closed - NOTUC