unicode-org / icu4x

Solving i18n for client-side and resource-constrained environments.
https://icu4x.unicode.org
Other
1.38k stars 178 forks source link

Split numbering systems data out from decimal symbols #5818

Closed Manishearth closed 6 days ago

Manishearth commented 1 week ago

We only have a couple numbering systems. DecimalSymbolsV2 should name the system and we can load it from a DecimalDigitsV1 key.

Manishearth commented 1 week ago

@sffc looking at baked data, we only use 21 numbering systems. Should we set up datagen to only generate numbering systems found in DecimalSymbolsV1, or all possible systems? I suspect the latter is more robust and follows our data principles.

numsys: tinystr::tinystr!(8usize, "adlm")
numsys: tinystr::tinystr!(8usize, "arab")
numsys: tinystr::tinystr!(8usize, "arabext")
numsys: tinystr::tinystr!(8usize, "beng")
numsys: tinystr::tinystr!(8usize, "deva")
numsys: tinystr::tinystr!(8usize, "gujr")
numsys: tinystr::tinystr!(8usize, "guru")
numsys: tinystr::tinystr!(8usize, "hanidec")
numsys: tinystr::tinystr!(8usize, "java")
numsys: tinystr::tinystr!(8usize, "khmr")
numsys: tinystr::tinystr!(8usize, "knda")
numsys: tinystr::tinystr!(8usize, "laoo")
numsys: tinystr::tinystr!(8usize, "latn")
numsys: tinystr::tinystr!(8usize, "mlym")
numsys: tinystr::tinystr!(8usize, "mymr")
numsys: tinystr::tinystr!(8usize, "nkoo")
numsys: tinystr::tinystr!(8usize, "olck")
numsys: tinystr::tinystr!(8usize, "orya")
numsys: tinystr::tinystr!(8usize, "tamldec")
numsys: tinystr::tinystr!(8usize, "telu")
numsys: tinystr::tinystr!(8usize, "thai")
sffc commented 1 week ago

Should we set up datagen to only generate numbering systems found in DecimalSymbolsV1, or all possible systems? I suspect the latter is more robust and follows our data principles.

Since these are data marker attributes, it is consistent with data principles to make this configurable in datagen. Datagen should be capable of both. It should be capable of both when we add support for filtering by attributes (do we have that landed yet?).

I think the default can be "auto", or select all DecimalDigitsV1 that are reachable from DecimalSymbolsV2.