Attribution as links instead of markup

IvanSanchez commented 2 years ago

I'm writing this during the 2022-03-08/10 joint OGC+OSGeo sprint.

After doing a basic JS client implementation, I've become slightly concerned about the attribution field on both the list of collections, and the metadata of a collection:

It is not clear whether attribution belongs to Part 1 (Core) or Part 2 (Geospatial Data). As I understand it, there can be an attribution field in either/both the list of collections (part 1) or in the metadata of a specific collection (part 2)

Therefore, it's not clear what the behaviour of a client should be - shall a client display attribution about the service, or only about the collections being displayed/used?

Allowing attribution only in part 2, or clarifying how the attribution in part 1 would be displayed would solve this concern.
Allowing servers to use any kind of HTML markup in the attribution field might be problematic. I see three specific problems: <script>s, and tracking pixels, and parsing.

In a best-case scenario, a OGC API server might push some invisible tracking pixels to clients (an image linking to an invisible 1x1 pixel). While good-intentioned, this opens a rabbit hole of cookies and EU GDPR concerns.

Another concern is clients with no means of parsing HTML and building a document model about it. Using a specific format of markup will force clients to include a parser for that markup language. A similar concern would arise if another markup language would be used, e.g. markdown.

The worst-case scenario would be an infosec attacker gaining a hold of a OGC API server, and changing the attribution field to include a <script> reference. That would be an attack vector to inject malware in any clients. My concern is that there's no well-intentioned scenario where allowing <script> as part of attribution might be useful.

In order to solve the second concern, I would like to suggest moving attribution to the links, e.g. instead of:

{
  "id": "1234567890",
  "title": "Example Collection Description Response",
  "description": "This is an example of a Collection Description in JSON format",
  "attribution": "<a href='https://www.ign.es' rel=' '>IGN</a> <a href='https://www.govdata.de/dl-de/by-2-0'>(c)</a>",

it might be better to add new attribution and license relationship roles:

{
  "id": "1234567890",
  "title": "Example Collection Description Response",
  "description": "This is an example of a Collection Description in JSON format",
  "links": [
    { "href": "https://www.ign.es", "rel": "attribution", "type": "text/html", "title": "IGN" },
    { "href": "https://creativecommons.org/licenses/by/4.0/", "rel": "license", "type": "text/html", "title": "CC-by 4.0" },
/* ...etc... */
  ],
}

Multiple attribution is possible by having more than one link with the attribution relationship role. Clients then would be able to format those as needed (concatenate with commas, or display a list, etc etc).

Attribution without actual links would be possible by having an empty href.

This might be against the spirit of https://github.com/opengeospatial/ogcapi-common/issues/296 , but would provide a solution for https://github.com/opengeospatial/ogcapi-common/issues/301 .

cportele commented 2 years ago

Regarding the second topic: Your concerns are valid, but I do not think that moving the attribution to the links is a solution.

A minor obstacle is that there is no registered link relation type "attribution". There are "copyright" and "license", but attribution is something else. Of course, OGC could use "http://www.opengis.net/def/rel/ogc/1.0/attribution" for the link relation.
Some data publishers have really detailed and lengthy requirements how to write the attribution. This can go beyond just a list of links.
An empty or missing "href" is not an option. "href" is mandatory and must be provided - it is a link after all. Leaving it empty (href="") does not work either as this is a a reference to the current document (see RFC 3986).

I think that we probably should deprecate the use of any markup in the attribution field. The attribution requirements usually also have rules for printed material, often writing the full URI as text, which can lead to long and hard to read attributions, but I guess that is still the better option. So, I would leave the attribution as a string member that should be displayed as is.

IvanSanchez commented 2 years ago

Some data publishers have really detailed and lengthy requirements how to write the attribution.

Yes, but there needs to be some kind of compromise between the data publishers and the developers of client applications (in the same way that there are compromises between the developers of different implementations). A specification like this already constrains the ways that data is published, and it shouldn't be a big deal to constrain (within reasonable limits) the way that the attribution is described.

A use case I'm also interested in is logos/logotypes. e.g. when visiting https://explore.osmaps.com, a small image like is always shown within the map viewport.

I would go as far as suggesting a logo relationship role, something like:

{
  "id": "Mastermap",
  "attribution": "Ordnance Survey"
  "links": [
    { "title": "Ordnance Survey", "href": "https://www.ordnancesurvey.co.uk/", "rel": "attribution", "type": "text/html"},
    { "title": "Crown Copyright", "href": "https://www.ordnancesurvey.co.uk/business-government/licensing-agreements/copyright-acknowledgements", "rel": "license", "type": "text/html" },
    { "href": "https://explore.osmaps.com/images/logo/osmaps-logo-dark.svg", "rel": "logo", "type": "image/png" },
/* ...etc... */
  ],
}

Maybe even a logo for dark backgrounds and another for light backgrounds (but suggest a maximum size, and let the graphic designers worry about margins and blurred backgrounds for contrast). But anything more complex than this, including any kind of markup, is too cumbersome for developing clients, IMO.

An empty or missing "href" is not an option.

OK, I'll agree that requiring URIs for attribution is reasonable.

I still think that moving the attribution to the links makes sense. I agree that the attribution field should be a string of plaint-text instead of markup (HTML, markdown, LaTeX, or else), and the spec should encourage client implementations to display attribution links if possible, attribution plaintext by default.

m-mohr commented 6 months ago

The STAC community wants to align with OGC API - Common, but we also fear the use of HTML due to the reasons above. In STAC we made good experience with CommonMark. Could it be an option to switch from HTML to CommonMark (with HTML rendering disabled)? It also allows to create links for example, but maybe with a recommendation for only Bold, Italic and Links or so?

PS: I also found the description in OGC API - Common rather ambiguous as also indicated in #290.

jerstlouis commented 6 months ago

@m-mohr I'm not sure I fully understand the concerns.

Recommending the content of markup in attribution to Bold, Italic and Links makes sense. But does it really make a difference whether HTML or CommonMark (first time I hear of this) is used?

The expectation is definitely NOT that a client would blindly run the attribution HTML code as-is, including <script> and so on, and we could definitely have very clear warning against doing that in the standard. So in my opinion it does not really matter whether CommonMark or HTML is used, and I would certainly prefer HTML myself, because we do not know anything at all about CommonMark, but we already have an HTML parser/renderer in our code base.

@cportele any opinion on the above?

Regarding #290 this seems to concern primarily Part 1 which is already published. Any ambiguity related to Part 2 specifically? I assume that after Part 2 is released, we will consider improving Part 1 and consider the feedback in #290 for a corrigendum or minor revision with improvements (there are several open issues tagged as Part 1).

m-mohr commented 6 months ago

CommonMark is a well-standardized subset (I think) of "Markdown", which is not really standardized at all with all the interoperability/compatibility issues.

but we already have an HTML parser/renderer in our code base

Yeah, we have CommonMark instead ;-)

The concern is that sanitizing HTML seems like a much more complex, security-related and often forgotten task compared to rendering CommonMark, which is still is pretty human readable even if it's not parsed.

opengeospatial / ogcapi-common

Attribution as links instead of markup #303