unicode-org / icu4x

Solving i18n for client-side and resource-constrained environments.
https://icu4x.unicode.org
Other
1.31k stars 169 forks source link

Duplication in `per`, `times`, and SI Prefixes Data in CLDR: Units Formatting Across Locales #5171

Open younies opened 1 month ago

younies commented 1 month ago

There are many duplications in the per, times, and SI prefixes data in CLDR: units formatting across the locales. For example, in French and English, it is almost exact, especially in short and narrow format.

How to find it:

  1. Review the per, times, and SI prefixes data in units formatting for multiple locales.
  2. Compare the data for English and French (short and narrow format).

Proposed Solution

Consider creating a common repository or structure for storing duplicated data to enhance data management and reduce redundancy.

Impact

sffc commented 3 weeks ago

Consider creating a common repository or structure for storing duplicated data to enhance data management and reduce redundancy.

The way we are doing this already is to have smaller data structs so that we throw everything to the datagen-time deduplication.

Each data struct should be independently evaluated to decide the best storage mechanism.

I do not think there is anything to discuss here on the general problem of data deduplication, but we can discuss the best layout for these two patterns specifically.

sffc commented 1 week ago

@younies What do you want to discuss on this? Please add back a discuss or discuss-priority label when it is more clear what you hope to get out of the conversation