Allow language based content negotiation

linked-art / linked.art

Development of a specification for linked data in museums, using existing ontologies and frameworks to build usable, understandable APIs

https://linked.art/

Other

95 stars 15 forks source link

Allow language based content negotiation #202

Open azaroth42 opened 5 years ago

azaroth42 commented 5 years ago

Raised by @beaudet on the WG call of 2018-01-30:

If the resource is requested with language based content negotiation (e.g. the HTTP header Accept-Language), should there be any changes to the linguistic content in the response? This could include either the selection of LinguisticObjects or the string values in label.

Is it important to specify, or can we be silent and see what people do?

jeremytubbs commented 5 years ago

+1 for "be silent and see what people do"

azaroth42 commented 4 years ago

Propose that this is out of scope but that implementers shouldn't be forbidden from doing it, given discussion at F2F3.

Or ... "be silent and see what people do" :)

aisaac commented 4 years ago

I think that if we shouldn't be fully silent :-) So I'd suggest to drop a note saying that the possibility exist but feel the community is not ready yet to recommend a best practice. It's not really out-of-scope, but it should certainly be postponed.

azaroth42 commented 4 years ago

Potential concern - if you have a lot of languages, the size of response without conneg could be high. Especially for multilingual descriptions, rather than names or other short texts.

Web browsers might also send the header by default ... could be good or could be bad! Good - user gets the language they expect. Bad - could be going through a client that can handle multiple languages more effectively than the native web browser and would miss out on data.

Should double check the default case if nothing is requested, and if something is requested and it is not available.

Completeness of response? If you're harvesting the data for a search engine (or other), you would be missing out on substantial amount of data. Could be misleading.

Shouldn't be silent on this one. Good to not invent a new way to do the same thing (language negotiation).

Proposal:

If Accept-Language is absent, then return all languages. If it's present, then process LinguisticObjects according to the preferences of the header.

azaroth42 commented 4 years ago

Related to #186 (which we're working on), we have a conundrum:

There are a LOT of possible languages. There are several possible vocabularies from which to draw language identities (ISO codes, AAT, GlottoLog). The needed languages will be specific to museums' collections and location. This would mean that we should at best list some common languages, but not even recommend which identities be used.

On the other hand, if we don't require the identity of the languages, then clients will not be able to process internationalized/localized data, nor will servers be able to translate between the codes in Accept-Language and the language fields in the data.

Further complicating this, there isn't a complete 1:1 alignment between AAT or GlottoLog and the ISO codes that will come in the headers.

Perhaps we need to recommend core, modern languages along with their respective ISO codes, such that language based content negotiation (either HTTP layer or client layer) can be implemented at all?

ajs6f commented 4 years ago

GlottoLog seems to have a mapping for ISO-639-3, which might be helpful.

azaroth42 commented 4 years ago

Yes, as does AAT (eg Spanish)

However there are historical and less well known modern languages that do not have ISO codes at all :( This likely wouldn't become a big issue, but with the (very welcome and important) renewed attention to diversity in collections, I am concerned that we would be perpetuating the anglo-/euro-centric biases of our information systems if we were to limit to only languages with codes.

ewg118 commented 4 years ago

Museums may very well have content in languages not represented by ISO codes, but is it likely that the metadata are in something not represented by ISO? A Punic-language inscription has no ISO code, but a GlottoLog one, but you'd want an inscription to be in the LA data regardless of Accept-Language. Personally, I'm for including multiple translations of descriptions in the response by default. If people want to get multilingual labels for entities, they should request the metadata for the concept URI, which ideally might deliver JSON-LD that conforms to LA.

azaroth42 commented 4 years ago

Good point, Ethan! Transcriptions shouldn't be subject to content negotiation, nor references to the language of a work that isn't transcribed. So the sorts of things that might be negotiable are very likely to have codes.

FWIW, I'm also in favor of client-side language processing rather than protocol-based. Eg. just put all the data you have in there in all the languages and the client will pick which ones to render.. or the user gets exposed to some new languages that they might not understand, and hopefully grows a bit in their awareness of the world through that process :)

azaroth42 commented 11 months ago

Propose defer