wet-boew / wet-boew-api-standards

Possible requirements for Government of Canada APIs based on the White House standards
Other
11 stars 11 forks source link

suggested language guidelines #8

Closed wardi closed 10 years ago

wardi commented 10 years ago

Returning all languages in one response can be much simpler to implement in systems that aren't designed for multilingual data. It's also simpler for end users to handle: only one request required and no extra parameters to worry about.

LaurentGoderre commented 10 years ago

An application is not likely to consume multiple language at the same time and returning all language would incur extra http traffic for potentially no reason. The requested language should be provided as a parameter.

wardi commented 10 years ago

@LaurentGoderre thank you for looking at this.

These suggestions are based on my adapting COTS software (CKAN) for use in with multilingual data for data.gc.ca. This software is like most web software in that it comes with no built-in support for multilingual data.

Both approaches are valid. For data.gc.ca the approach above was the simplest to implement and the easiest for me as a user of the API to consume. Granted, as a user I needed all the data from each record including both languages. If I chose the other approach then I would have needed to make twice the queries and ignore the duplicated non-language data. Also, the resulting software would no longer be interoperable with other web sites because the API would be different.

nschonni commented 10 years ago

The results should likely be uni-lingual and based on the Accept-Lanugage header or a query parameter, falling back to the default language when neither is provided.

LaurentGoderre commented 10 years ago

@nschonni you could use this but in the case of a non-browser client it's not guaranteed that these are set. It's not even guaranteed that that language is actually the preference (my Firefox is english)

LaurentGoderre commented 10 years ago

@wardi I suppose it depends the use case. The end-user would rarely request multiple languages. However in the case of machine to machine transaction then yes all language would be beneficial

LaurentGoderre commented 10 years ago

It also comes down to giving the developers the choice. There should be a way to limit to one or more language or output everything

chrismajewski commented 10 years ago

We are trying to address this at a metadata level as well.

I've made the case that language is not a mandatory element seeing that API data many not have any language. The temperature outside is not english or french, if that's all you provide as output you are non-lingual.

Laurent's latest suggestion seems to be leading into a reasonable guideline. Minimum: If there is human language you either need to offer a choice of ( at least ) either official languages or provide both in the same request. Best Practice ( or Preferable? ): Offer both options by url parameter and Accept-Lanugage.

We can expand the requirement stating that alternate language field must have the same name with the _language extension but forcing nesting adds a host of new issues. The language is a bit strict, you'll see the remainder of the document is trying to be pragmatic for a first standard.

We should also be using/recommending bcp-47 as used in http://googlegeodevelopers.blogspot.ca/2009/10/maps-api-v3-now-speaks-your-language.html and the same shortest ( two letter 'en' and 'fr' ) codes used in WET 4.0

The intent is to keep the barrier of entry as low as we can to encourage adoption. This is the WhiteHouse approach born of wisdom.

Sam and I are working out a best practice document for the end of Oct and something more formal for WMC mid november.

If we can work out a happy medium I'll make sure we include this word about language in the implementation guide.

wardi commented 10 years ago

@LaurentGoderre We're talking API standards -- it's always machine to machine :-) If you're saying there should be a way to ask for all languages in addition to a single language then I'm happy. I want to avoid APIs that create both more work for me as a user, and more work for me as a developer implementing them.

@chrismajewski IIUC bcp-47 is as inclusive as 639-2 but allows for two-character tags as well? Sounds good to me.

chrismajewski commented 10 years ago

@wardi Yea BCP-47, it's pretty slick. They went all HTML 5 and simplified back to common sense instead of bolting on more complex logic. Backwards compatible, prefers simplicity and clear.

For the GoC only two BCP-47 components are relevant, the second is pedantic even though I use it where I can ( doesn't hurt ).

en en-CA fr fr-CA

There's a nice playground for the standard, all four validate. http://schneegans.de/lv/

And if you want to melt your brain a bit check out http://schneegans.de/lv/?tags=en-GB-boont-r-extended-sequence-x-private

chrismajewski commented 10 years ago

Good step forward, something to revise from there if required.

Sam is aware of this change, it's in the outline of the API document on it's way.

Thanks.

LaurentGoderre commented 10 years ago

@wardi what I meant is user consumption vs batch transfer between systems. If it is meant for user consumption, usually one language is recommended

wardi commented 10 years ago

The Google Maps API example is a weird one. Google maps license terms dictate that you may only call it from a user's browser, and you must present the information it returns directly to the user. You can't store the information it gives you and combine it on the back end. You can't harvest the information for use later.

Given those restrictions, having language as a parameter makes sense for Google maps. The only audience for their API is front-end JS developers serving one individual and the language the individual prefers is known.

That's not the only use for APIs in general, though. The normal use of a GET REST-style API is to provide all the data related to an object or object(s) requested. Text containing language content is simply data related to those objects. There should be a strong justification for presenting some related data differently than the rest.

The reason my first commit suggested a nested approach is because that approach is absolutely explicit: "this field contains text data and these are all the translations I have for that data". Every other approach is a bit of a hack. In particular using a language a parameter for APIs puts real burdens on some users of that API:

  1. What other languages is this record available in? Should we have a standard for declaring those languages and URLs to use and include them when one language version is returned?
  2. If my preferred language doesn't exist for a record will it be returned as missing? Do I need to also request each record in English just in case?
  3. If I'm trying to merge multiple language versions how do I tell which fields contain language text? Can I just assume it's the ones that change when I request a different language?
  4. What do I do if a value that I know isn't text is different when I request the same record in a different language?

The _lang suffix approach doesn't solve number 3 completely, but it does solve the other problems and shifts the burden of number 4 on to the developer of the API instead of the users.

LaurentGoderre commented 10 years ago

That's why it's best to have the option to limit the data transfer. If you don't add it, return everything.

Sounds good ?

chrismajewski commented 10 years ago

I think we all agree.

Laurent is right, we should be using language choices at the outset.

Ian is right, we should prefer nested dual language responses if the option exists.

Maybe we need to explain the problem in greater detail and our preferences in the process? Something like this?

"Language elements should be offered by choice using [accept] and [url param] where possible. [ EXAMPLES, using accept and url ]

Where language elements are best both represented in the same request you should either offer the unilingual choice for bandwidth concerns.

When representing two languages in one request nested responses are preferred [ EXAMPLE, nested ]

When nesting isn't possible language should be encoded as follows for consistency. [ EXAMPLE, flat_lang ] "

LaurentGoderre commented 10 years ago

:+1: also this is just an issue with json. Xml has great multilingual support

chrismajewski commented 10 years ago

You still need to either include it or not in the response, this is an important discussion. This should also apply to all formats output from APIs, those required or not. This is another reason to err on lenient till we have a better understanding of people's needs.

And enjoy XML while you have it, it's the old man now. HTML 5, BCP-47, JSON... the era of simple protocols / standards is going to leave the complex behind over time. ;)

wardi commented 10 years ago

@LaurentGoderre I think the same issues still come up with XML. When we were setting up harvesting of the geogratis data which was offered in a "standard" XML format, we still needed to ask to get a version of it that included both languages. But XML does have that lang="foo" attribute convention which is rather nice. The nested language JSON might be the closest thing JSON has to that.

@chrismajewski Bandwidth wasted because of including extra languages isn't a real problem. It's the norm is to throw out the majority of the data you receive in an API call and just take the few fields you're interested in. Also, adding a few more bytes (or KB) costs nothing next to the cost of handling even one extra query on the server side.

I think we are all pulling in the same direction here. I'll take another swing at editing this section. I won't add the nested JSON approach back because although the purist in me likes it, it would add complexity for most "simple" use cases.

chrismajewski commented 10 years ago

So many different problems to solve.

You'd really like to offer field limiting/selection, you pick the fields you want in your data back.

We ( Earthquakes Canada ) can't do that, we need to reduce possible URLs for caching reasons.

For languages we are making good choices.

chrismajewski commented 10 years ago

https://github.com/wet-boew/wet-boew-api-standards/issues/18