Define minimum set of supported languages

mlagally commented 2 years ago

There are several complications in rendering text in some languages,see TD description of multi-language text and corresponding heuristics / best effort approach.

The profile should select a minimum set of common languages that are guaranteed to be supported and can be rendered by all compliant consumers.

mlagally commented 2 years ago

A good starting point is: https://www.oracle.com/java/technologies/javase/jdk8-jre8-suported-locales.html

mlagally commented 2 years ago

Proposal is to select the languages that are supported by JDK8.

mlagally commented 2 years ago

Profile call on June 29th: We should check with @aphillips from the i18n team about a common set of locales that works across browsers, node.js, Java, C#.

mlagally commented 2 years ago

A reference on browsers: http://4umi.com/web/html/languagecodes.php

aphillips commented 2 years ago

I think the I18N community and (for different reasons) the WOT community would be unhappy to try to define this for a number of reasons. Can you describe what aspect of the "several complications" you're concerned about?

To start with, content rendering capability generally exceeds the availability of locale data. Thanks to Unicode and advanced rendering support present in most browsers and runtime environments, nearly any language--modern or historical--can be rendered. The primary constraint here is generally the provisioning of fonts. General purpose devices (laptops, phones, tablets, etc.) can generally provide at least fallback support for most scripts using fonts such as Noto. Small devices ("things") that are resource constrained generally want to limit storage and may pick-and-choose which language resources to install, perhaps based on configuration on first use. A device in Japan that supports only Japanese fonts, input, etc. by eschewing Chinese and Korean support is probably fine. Device developers won't thank us for requiring support for languages that they don't intend to serve with a given product.

Unicode's Common Locale Data Repository [CLDR] project is the basis for modern locale data in most operating environments (OSs, browsers, JDKs, node, etc. etc.). The supported locale list in the latest version include 361 locales at the "full" support level and this number increases over time (twice each year, as new releases ship). A static minimum list cannot track such support and might artificially limit the languages supported by devices--even though the data is widely disseminated. CLDR supports many languages that are not "top of mind" and didn't appear commonly when e.g. JDK8 shipped. For example, the Vunjo (vun) language of Tanzania or the Fulah (ff) language of central Africa both represent languages where intensive effort has greatly improved computing support in recent years. Language communities won't appreciate being excluded from the "required" list (effectively in perpetuity) in spite of having made this investment.

Also, note that locale systems use a "best fit" or fallback mechanism. For example, there is no en-TR (English as used in Türkiye) locale in CLDR, but requesting that locale falls back to en or en-001 data, which produces recognizable results.

I have added this topic to the I18N teleconference this week, which falls on Thursday at 1500UTC. If you can add more info before then, that would be helpful.

macchiati commented 2 years ago

I agree with everything Addison wrote. To the fonts paragraph, I would also add input methods (keyboards).

The profile should select a minimum set of common languages that are guaranteed to be supported

I'm missing some context here, but in general "supported" is not a binary; there are levels of support. For CLDR we distinguish 3, but there are many more. Does a platform support a language/locale if it doesn't have spell-checking? Etc.

mlagally commented 2 years ago

@aphillips Thanks very much for your comments and thoughts. Let me elaborate the target scenarios that we are trying to address with this OOTBI profile. Main goal is out of the box interoperability, i.e. to be able to hook up a high number of devices from multiple vendors into a common IoT deployment. There are many ways to do this in a "best effort" approach, and in today's world there are a multitude of cases, where the integration of a new device is not easy and needs additional manual (in many cases development) work.

Note that a thing description is not a markup or format language that targets the rendering of text - it primarily describes the structure and interactions of devices (aka things). These have properties (attributes), actions (~function calls) and in some cases a notification mechanism (events).

These structure elements have identifiers and human readable names, that would be used to manage these devices via a common application UI, e.g. on a server or in the cloud. Typical scenarios are the display of an icon with a device name / type for each device on a world map.

When the user clicks on that icon he can look into the device details, i.e. inspect or set property values, trigger actions etc. Devices typically have limited complexity, i.e. somewhere in a ballpark of 10-100 attributes, 0-50 actions, 0-20 events.

These interaction elements have a name and a human readable description, which could be localized, when the author of the thing description has provided an additional set of localized names and descriptions.

in most cases the thing description only contains a single language (the examples in the plug fest were primarily english and japanese). In the profile spec we are trying to be very pragmatic and address a common set of existing runtime environments and the "out of the box" rendering capabilities of typical runtimes. We try to get the spec narrow, i.e. select a common subset of languages, that is guaranteed to work among all implementers. There are certainly ways to render additional languages that are not part of this minimum set with fallback solutions and heuristics, installing additional fontes etc., however "out-of-the-box" implies guaranteed interop without quirks and fall back heuristics.

To be pragmatic - I would assume that if we take the intersection of languages that are supported by the majority of runtime envoronments "out of the box", we have selected a common subset that can be implemented by all.

Cases that require additional languages can still be used by regular thing descriptions, that are not constrained to a profile, i.e. device manufactureres and application vendors can use anything they consider fit. It is just out of scope of the OOTBI-Profile.

mlagally commented 2 years ago

@macchiati Please see my previous response for setting the scene. Most devices we describe with thing descriptions are very simple, e.g. an LED lamp, an air conditioner, door sensors, switches, alert lights, factory robots, electric screwdrivers, power generators, pumps.

Many of them do not have a (high res alphanumeric) display or a typical ( QUERTY) keyboard and no requirements to render multi-language text. This comes into play when these devices are monitored/managed using an application that was designed for (usually trained) personnel.

aphillips commented 2 years ago

@mlagally Thanks for the reply.

I understand what you're doing/trying to do and what the ecosystem is that you're working in.

This comes into play when these devices are monitored/managed using an application that was designed for (usually trained) personnel.

I am somewhat wary of the phrase usually trained because it is sometimes used to give permission to have a crappy customer experience (the "trained personnel" will take English and like it!). I don't require a lot of training to use a light bulb or a thermostat 😊.

I'll point out that the properties (attributes, actions, events and their values) of a Thing should generally be locale-neutral, even though they are serialized on the wire as strings, frequently using English language tokens (on, off, etc.). For many devices, the actual interface is a localized application which does not require the device to send any localized display names up the wire. But obviously there are many devices that are programmable via simple interfaces, hubs, or even from the browser. These require the device to supply any localization.

Let's go back to the beginning. Can you explain what you meant by several complications in rendering text in some languages? What are you trying to accomplish here? Your more recent comment said:

In the profile spec we are trying to be very pragmatic and address a common set of existing runtime environments and the "out of the box" rendering capabilities of typical runtimes.

What "runtime environment" do you have in mind? AFAIK (and to your own point in your comment), it isn't the Thing that does any rendering. It is often a general purpose computing device (PC, phone, tablet, etc.) that is being used to interact with the Thing. These generally have fonts, keyboards, displays, and cetera that are quite capable.

mlagally commented 2 years ago

I'll point out that the properties (attributes, actions, events and their values) of a Thing should generally be locale-neutral, even though they are serialized on the wire as strings, frequently using English language tokens (on, off, etc.). For many devices, the actual interface is a localized application which does not require the device to send any localized display names up the wire. But obviously there are many devices that are programmable via simple interfaces, hubs, or even from the browser. These require the device to supply any localization.

There will be both: A localized application of course can do anything it wants and is not constrained by the profile. However, in additions to "title" and "description" metadata for proerties, actions and events, the Thing description also contains multi-language variants "titles" and "descriptions", see https://w3c.github.io/wot-thing-description/#interactionaffordance.

Let's go back to the beginning. Can you explain what you meant by several complications in rendering text in some languages? What are you trying to accomplish here?

If you read the description at: https://w3c.github.io/wot-thing-description/#titles-descriptions-serialization-json there are heuristics and best effort approaches. The profile should go beyond that and contain well defined language that ensures that devices and applications work together across platforms and runtime environments.

Your more recent comment said:

In the profile spec we are trying to be very pragmatic and address a common set of existing runtime environments and the "out of the box" rendering capabilities of typical runtimes.

What "runtime environment" do you have in mind? AFAIK (and to your own point in your comment), it isn't the Thing that does any rendering. It is often a general purpose computing device (PC, phone, tablet, etc.) that is being used to interact with the Thing. These generally have fonts, keyboards, displays, and cetera that are quite capable.

On all these powerful devices you have a runtime environment that has some constraints. There are many runtime environments used in practice. Some of the more common application runtimes are Java, node.js, TypeScript, C#, C++, Python, ...

https://www.oracle.com/java/technologies/javase/jdk8-jre8-suported-locales.html Browser runtimes also have certain constraints: https://seleniumbase.io/help_docs/locale_codes/

We are trying to agree on a common subset of the local codes that work on all common runtime environments, i.e. where the profile can guarantee that consumers will be able to render the multi-language titles and descriptions. If we don't achieve this goal, we will just have mandatory title and description in the default language (which in most cases will be english)

benfrancis commented 2 years ago

Thanks @aphillips for your valuable input on this topic.

@aphillips wrote:

What "runtime environment" do you have in mind? AFAIK (and to your own point in your comment), it isn't the Thing that does any rendering. It is often a general purpose computing device (PC, phone, tablet, etc.) that is being used to interact with the Thing. These generally have fonts, keyboards, displays, and cetera that are quite capable.

This is correct.

In my opinion, trying to prescribe a fixed set of locales is at best arbitrary and at worst actively discriminating against people who don't speak one of those languages.

WebThings Gateway is currently translated into 34 different languages. I wouldn't want to arbitrarily prioritise some of those over others, or discourage the community from translating the interface into new languages which don't appear on a prioritised list.

As @macchiati pointed out, support for a certain locale is also not binary. There are levels of support, and layers of potential fallbacks.

A Consumer can negotiate a preferred locale with a Thing (see "language negotiation" in section 6.3.2 of WoT Thing Description). Following that negotiation it should make its best effort to render the strings provided rather than stop processing the Thing Description altogether.

I think this is another example of "best effort compliance" vs. "strict compliance", see #187.

w3c / wot-profile

Define minimum set of supported languages #232