mapbox / locale-utils

0 stars 2 forks source link

Fold into model-un? #1

Open 1ec5 opened 6 years ago

1ec5 commented 6 years ago

model-un has lots and lots of language identifier–related functionality. parseLocaleIntoCodes() is similar to getLanguage() and getCountry() (although there’s no getScript(). bestMatchingLocale() is a lot like getAllLanguagesLike(), too.

On the other hand, this library is really lightweight, which can be a plus for a server application that can’t take on much more in the way of dependencies.

/cc @bsudekum @KaiBot3000

KaiBot3000 commented 6 years ago

These projects are indeed very closely related. We use model-un in both our data pipeline and production, so we have an interest in preserving the current behavior or being involved in any updates to current functions. I'd love to see additions and improvements, though!

bsudekum commented 6 years ago

The only thing I'm worried about is that this bestMatchingLocale takes two arguments here; the input locale and an array of possible locales to choose from. We would have to add this functionality to model-un to remove the need for this repo.

apendleton commented 6 years ago

In geocoding we actually kind of abuse model-un at this point, because we deliberately supply slightly not-spec-compliant language code matching. In particular, our matches are case-insensitive, and we instead use position to distinguish between subtag types. So like, per the spec as I understand it and model-un supports it, for the language code fr-FR (French as used in France), the "fr" is identified as French because it's lowercase and the "FR" is identified as France because it's uppercase, where we distinguish them by position (and so treat "fr-fr" as equivalent, where model-un does not).

We achieve this by importing model-un's data but reimplementing its functions with different behavior; see https://github.com/mapbox/carmen/blob/master/lib/util/closest-lang.js#L36-L75

Not sure what that means with respect to your proposal, but just an FYI on what we do now.

1ec5 commented 6 years ago

In particular, our matches are case-insensitive, and we instead use position to distinguish between subtag types.

The Geocoding and Directions APIs have the same need for a case-insensitive language parameter, so we arrived at the same behavior in this library.

As far as I can tell, folding locale-utils into model-un could be purely additive: a new getScript() function and an optional second parameter on getLanguage() that takes an array of candidate locale codes to select from (still using languages.json for details about each language).