Closed m-mohr closed 1 year ago
12-DEC-2022: Discussed in the SWG. @pvretano will implement the following:
This was a very quick turnaround, thanks.
I'm confused on point 1: Why replace language with languages? I think both should exist:
language
+ all alternate links for the set of available hreflang's)I agree with all other points and will align to use alternate instead of self.
Example from STAC:
{
"stac_version": "1.0.0",
"stac_extensions": [
"https://stac-extensions.github.io/language/v1.0.0/schema.json"
],
"type": "Feature",
"id": "item",
"bbox": [...],
"geometry": {
"type": "Polygon",
"coordinates": [...]
},
"properties": {
"datetime": "2020-12-11T22:38:32Z",
"example": "An example product",
"languages": [
"de",
"en"
],
"language": "en"
},
"links": [
{
"href": "https://raw.githubusercontent.com/stac-extensions/language/main/examples/item.json",
"rel": "self",
"hreflang": "en"
},
{
"href": "https://raw.githubusercontent.com/stac-extensions/language/main/examples/de/item.json",
"rel": "alternate",
"hreflang": "de"
},
{
"href": "catalog.json",
"rel": "parent",
"title": "Example STAC Catalog",
"hreflang": "en"
},
{
"href": "catalog.json",
"rel": "root",
"title": "Example STAC Catalog",
"hreflang": "en"
}
],
"assets": {
"data": {
"href": "https://cloud.example.com/examples/file.tif"
},
"metadata": {
"href": "https://cloud.example.com/examples/metadata.xml",
"type": "application/xml",
"hreflang": "en"
},
"metadata_de": {
"href": "https://cloud.example.com/examples/metatdata_DE.xml",
"type": "application/xml",
"hreflang": "de"
}
}
}
Just a FYI: In a CDB 2.0 datastore, there is a mandatory element 'language' (aka dct:language, PT_Locale) whose content is based on BCP 57 (5646). From the language perspective, OGC API - Records and the STAC API and CDB 2.0 are consistent.
FYI: In the Testbed-18 ER Secure and Async Catalog (OGC 22-018) section 2.2.2, there is also the following note:
NOTE INSPIRE requires the Discovery Service to advertise the default language in the CSW GetCapabilities response. Proposing a similar mechanism to advertise the default language is further work. Possible approaches include:
@pvretano Can you confirm that https://github.com/opengeospatial/ogcapi-records/issues/195#issuecomment-1346718568 makes sense to you, too? I'd like to release this behavior into STAC soon and it would be really great to have this aligned between Records and STAC!
Here's the corresponding STAC extension: https://github.com/stac-extensions/language#fields-for-catalogs-collections-and-item-properties
@m-mohr looking at it today. Will update comment once I had reviewed.
Thanks @pvretano. While you are at it, do you think it makes sense to allow more than just the language codes in languages?
So for example instead of just "languages": ["de", "en-US", "gr"]
we could also think about a bit more, which could be helfpul for clients. For example:
"languages": [
{ "code": "de", "name": "German", "native": "Deutsch", "dir": "ltr" },
{ "code": "en-US", "name": "English (US)", "native": "English (US)", "dir": "ltr" },
{ "code": "gr", "name": "Greek", "native": "Ελληνικά", "dir": "ltr" }
]
Only code would be required.
@m-mohr my original comment was perhaps not as clear as it should have been because it did not distinguish clearly the language of the resource versus the language of the record.
The previous "language" tag was meant to encode the language of the resource that the record describes (if there was an associated language). So, changing it to an array allows a set of languages to be associated with the resource (e.g. the resource described by the record is available is English, German, Greek, etc.).
The language of the record itself (i.e. the language in which the record is presented to the client) is requested using the "Accept-Language" header when the record is retrieved. That language, however, is currently not explicitly encoded in the record with a specific tag. Rather a "rel=self" link can be included that includes an "hreflang" attribute to indicate the language of the retrieved record. Additional links with "rel=alternate" and "hreflang" attributes can point to additional language representations of the record.
Does this all make sense?
I am mocking up an example record with language information which I will add to the issue later today.
I like the encoding of "languages" that that you present above so I will use that.
If you think there would be value in explicitly encoding the language of the record in the record itself then I would not be opposed to reintroduing the "language" tag for that purpose ...
Thank you, @pvretano. This clarifies what the difference between STAC and Records is currently.
First and foremost, it is 100% clear and aligned between STAC and Records that in an API context content negotiation is used to request specific languages and report the language of a response. We are also aligned with regards to the hreflang
property. Unfortunatly, there are also static catalogs - both in STAC and Records. Here content negotiation is often not available as such we need an alternative. Also, it is often useful to replicate imporant headers (e.g. the content language) in the body because if you store a response to a (local) file, you loose the (language) headers, but it could still be useful to have these information. Thus, my aim was to find a solution that works without headers for static catalogs and can also be useful in the context of APIs, I think.
For the language you may want to encode multiple things:
To encode the language of a resource we use the hreflang property in links and assets. Now the difference comes up:
language
property for the current language of the metadata and the languages
property for all available languages of the metadata. This is probably due to the fact that for STAC the metadata language is often not necessarily the language of the resource (imagery usually doesn't have a language).In theory, you are right, we don't need these properties at all because it could all be handled through hreflang in links. self link + hreflang could describe the language of the metadata, alternate links + hreflang could describe other available languages, link to data file (resource) + hreflang could describe the language(s) of the resources.
This is pretty cumbersome though as you'd need to wade through links to figure this out. Also, in STAC self links are not required as catalogs can be portable and the location may not be known upfront. Also, I'm not overly happy with overloading "alternate" for alternative languages, alternative media types, alternative ... (but that's a different discussion). In the end, the language and languages properties are often just a "summary" and for convenience.
Still, I think it would be good to declare this directly without having to look through links with hreflangs.
Ultimately, we could also allow for a very verbose solution:
language
(shall equal to the hreflang in the self link, just the language code)languages
(shall correspond to the hreflangs of the alternate + self links, but may contain additional properties)resourceLanguages
(shall correspond to the hreflangs of the resource links, but may contain additional properties)While "language" and "languages" could be aligned between Records and STAC, I'm not so sure about the "resourceLanguages". STAC doesn't need that in many cases and I wasn't able to come up with a good name that describes both cases (assetLanguages vs. resourceLanguages), so we may just have different properties here that don't conflict but share the same structure (as described above). An alternative could be redordLanguage, recordLanguages and languages, but then we'd be less aligned between STAC and Records because record doesn't fit into the STAC terminology. So I'd prefer the first variant, but happy to discuss other ideas and alternatives.
What do you think? Would you be open to that?
@m-mohr just to make sure I understand ...
language
is the language of the record in hand and is equal to the hreflang
value of the self
link if it exists and has an hreflang
specifiedlanguages
is the list of other languages that the record can be requested in; if there are alternate
links in the record with hreflang
attributes, the hreflang
values must exists in this languages
listresourceLanguages
is the list of languages in which the resource being described by the record is available in.lanugages
and resourceLanguages
properties shall be as you presented in this commentIs this correct? If yes, that I think I am OK with that. If you verify that that my understanding is correct then I will present to the SWG and report back in this issue. (NOTE: next SWG meeting is on the 23-JAN-2023 ... I hope that is not too late for you).
Thank you for taking the time, @pvretano. Yes, this is generally correct.
I have once concern though about the requirement in the second bullet. You are saying:
if there are
alternate
links in the record withhreflang
attributes, thehreflang
values must exists in thislanguages
list
I see potential issues here which I mentioned above due to the overloading of the alternate relation type (alternate type vs. alternate language). Here's an example for some links that would not be unusual to see in STAC and I could imaging that it also occurs in Records (although I think you require the type
, right?):
Let's say the links are in a metadata document in Greek (i.e. contains "language": "gr"
)
{
"href": "../de/item.json",
"rel": "alternate",
"hreflang": "de"
},
{
"href": "../item.json",
"rel": "alternate",
"hreflang": "en"
},
{
"href": "https://stacindex.org/browser/example/de/item.json?uiLanguage=de",
"rel": "alternate",
"type": "text/html",
"hreflang": "de"
},
{
"href": "https://stacindex.org/browser/example/item.json?uiLanguage=en",
"rel": "alternate",
"type": "text/html",
"hreflang": "en"
},
{
"href": "https://stacindex.org/browser/example/item.json?uiLanguage=fr",
"rel": "alternate",
"type": "text/html",
"hreflang": "fr"
},
{
"href": "https://stacindex.org/browser/example/gr/item.json?uiLanguage=gr",
"rel": "alternate",
"type": "text/html",
"hreflang": "gr"
}
You see that there are more languages available in the UI than for the metadata. I'd expect that languages
would be something like the following (i.e. not include French):
"languages": [
{ "code": "de", "name": "German", "native": "Deutsch" },
{ "code": "en", "name": "English", "native": "English" },
{ "code": "gr", "name": "Greek", "native": "Ελληνικά" }
]
So either we make the relationship between languages and the alternate type less demanding or we have to clearly specify the corresponding media types, but that would (at least in STAC) be JSON + GeoJSON (+ missing type
as type
is not required in STAC yet).
Thank you for bringing it to the SWG. Jan 23 is fine for me. If it helps I could also join the meeting. I'll also prepare an update for the STAC extension that follows this proposal.
I just had another idea to "merge" resourceLanguages and languages into languages and just add boolean properties as follows:
"languages": [
{ "code": "de", "name": "German", "native": "Deutsch", "record": true, "resource": true },
{ "code": "en", "name": "English", "native": "English", "record": true, "resource": true },
{ "code": "gr", "name": "Greek", "native": "Ελληνικά", "record": true, "resource": false },
{ "code": "fr", "name": "French", "native": "Française", "record": false, "resource": true }
]
I'm not sure whether this is a good idea and whether this mixes separate concerns too much so looking for thoughts of others.
@m-mohr my feeling is that it mixes separate concerns too much but lets give others a chance to chime in with their thoughts ...
Yeah, happy with that, too.
An addition to https://github.com/opengeospatial/ogcapi-records/issues/195#issuecomment-1380306075: Should the languages list contain the current language itself? I'd say for clients it would be good so it would just not be alternate, but alternate + self.
@m-mohr yes I suppose the languages list should contain the current language as well although that is slightly redundent. Perhaps we can get rid of language
tag and simple say the first item in the languages
list is the language of the record in hand.
About this comment ... I hadn't considered that but I would say that the list of lanagues should include all the avilable languages independent of their media type representation. If there is a type dependency, that can be represented in the alternate
links via the type
attribute and/or negotiated between the client and server using the normal HTTP contant type and language negotiation handshake. Your thoughts?
@pvretano Interesting idea about putting the current language first. While I like having all in one place I don't like that it is not very explicit and "the average user" may get confused what the actual language is. It just needs good knowledge of the spec. Alternatively, we could also remove the current language from languages
and instead of just proving a code for language use the "language object" from above als there. Phew... no strong preference right now.
Example:
"language": { "code": "gr", "name": "Greek", "native": "Ελληνικά" },
"languages": [
{ "code": "de", "name": "German", "native": "Deutsch" },
{ "code": "en", "name": "English", "native": "English" }
]
I'm not sure about adding adding e.g. the "UI languages" to the languages list. It feels a bit weird to me as it mixes separate concerns. For example, I'm currently making STAC Browser mutli-lingual with right now 6+ planned languages and the metadata only has 2 metadata languages. So the languages list would have 6 entries and that seems a bit excessive to have in the languages list...
(but of course I'm relatively biased right now towards the usecase I'm working on)
I updated the STAC extension to reflect what you proposed here: https://github.com/stac-extensions/language
@m-mohr I have no strong perference. However if I had to pick I would say ... language
for the current languages. languages
for the list of other available languages. So, the current language is NOT in the list of other languages.
Still think that the list of other languages should contain all the available other languages regardless of the representation. The HTML representation is as valid as any other and likely one of the more common represenations ... no?
I'll review the STAC extension write up later today ...
The HTML representation is as valid as any other and likely one of the more common represenations ... no?
No, not in my eyes. For me languages
is the list in which the source metadata files are available. The STAC clients usually only work with the source metadata (JSON) variants and all other are just spit out or ignored. But I guess I could filter the languages somehow...
@m-mohr I could be wrong about the HTML representation ... I'll present to the SWG and see what the others think.
23-JAN-2023: Is STAC asset language is represented using hreflang in the asset section and there is a rule that basically says that if a STAC record is requested in a specific language AND the asset has associated languages, only the request language is represented in the asset section. So, if the STAC item is requested in Greek and there is a "Greek" asset, only that link will be listed in the asset section. Of course, all this only applies to the API; static records would probably include the links to all the available languages.
@m-mohr with regard to the language
parameter in the STAC API language proposal, why is it only a single language? Can't its value be the same string as that used for the Accept-Langauge header with the same semantics (e.g. `langauge=de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7,fr;q=0.3``)?
@pvretano This was just meant as a very simple alternative for "tinkering" in "simpler" environments, e.g. in the Browser where it's not easily possible to send HTTP headers. So I kept it simple. Recently, I've actually thought about removing the parameter altogether and just relying on header. What do you think? What's the general direction OGC APIs go for? I've often seen e.g. ?f=json
in OGC API implementations as an alternative to Accept
headers, which would somewhat align with the current specification of ?language=de
, it seems.
@m-mohr the usual thinking at OGC is to "recommend" that implementations have a mechanism to mint URLs that need to be embded or for situations where the client does not have easy access to the use of HTTP headers. So, take f
for example. That is not part of the specification per se. It is just an example for creating URLs where the output format can be specified. I guess it would be the same thing with a language
parameter. It would not be "standard" but only a suggestion that implementations create a mechanism for requesting records in a specific language when access to the HTTP headers is not feasible. In all cases the HTTP way is the normative way.
@pvretano Then I'd suggest following the same pattern. As I can't find anything about f
in the specs (features, records), I'd also remove it from the STAC API - Languages extension.
@m-mohr here is the reference to f
in Features ... https://docs.opengeospatial.org/is/17-069r4/17-069r4.html#encodings
It's in the NOTE in that section ...
@pvretano Thanks, I did not find that (but "f" is also not an ideal search term ;-) ). So you'd add a similar wording for language
or accept-language
into Records? Then I'd just refer back to that in the STAC API extension.
@m-mohr yes ... that is my plan.
PR #211 created to align language handling as per this discussion in this issue.
@pvretano Added a comment in the PR, thanks.
01-MAY-2023: Resolved by #211. Closing.
As far as I can see,
hreflang
is meant to follow RFC 5646 (Language-Tag). For thelanguage
property the format seems undefined. I'd propose to clarify that it uses the same format ashreflang
.Additionally, I'm wondering whether it would be helpful to define a list of available/supported languages, e.g. as a property
languages
, which is an array of languages.Also, how should alternative representations in other languages be communicated in (static) catalogs? Maybe multiple
self
links with differenthreflang
s?I'm asking because I'm writing this up for STAC and would like to align as much as possible. See also https://github.com/stac-extensions/language and https://github.com/stac-api-extensions/language