Split numbering systems data out from decimal symbols

Manishearth commented 1 week ago

We only have a couple numbering systems. DecimalSymbolsV2 should name the system and we can load it from a DecimalDigitsV1 key.

Manishearth commented 1 week ago

@sffc looking at baked data, we only use 21 numbering systems. Should we set up datagen to only generate numbering systems found in DecimalSymbolsV1, or all possible systems? I suspect the latter is more robust and follows our data principles.

numsys: tinystr::tinystr!(8usize, "adlm")
numsys: tinystr::tinystr!(8usize, "arab")
numsys: tinystr::tinystr!(8usize, "arabext")
numsys: tinystr::tinystr!(8usize, "beng")
numsys: tinystr::tinystr!(8usize, "deva")
numsys: tinystr::tinystr!(8usize, "gujr")
numsys: tinystr::tinystr!(8usize, "guru")
numsys: tinystr::tinystr!(8usize, "hanidec")
numsys: tinystr::tinystr!(8usize, "java")
numsys: tinystr::tinystr!(8usize, "khmr")
numsys: tinystr::tinystr!(8usize, "knda")
numsys: tinystr::tinystr!(8usize, "laoo")
numsys: tinystr::tinystr!(8usize, "latn")
numsys: tinystr::tinystr!(8usize, "mlym")
numsys: tinystr::tinystr!(8usize, "mymr")
numsys: tinystr::tinystr!(8usize, "nkoo")
numsys: tinystr::tinystr!(8usize, "olck")
numsys: tinystr::tinystr!(8usize, "orya")
numsys: tinystr::tinystr!(8usize, "tamldec")
numsys: tinystr::tinystr!(8usize, "telu")
numsys: tinystr::tinystr!(8usize, "thai")

sffc commented 1 week ago

Should we set up datagen to only generate numbering systems found in DecimalSymbolsV1, or all possible systems? I suspect the latter is more robust and follows our data principles.

Since these are data marker attributes, it is consistent with data principles to make this configurable in datagen. Datagen should be capable of both. It should be capable of both when we add support for filtering by attributes (do we have that landed yet?).

I think the default can be "auto", or select all DecimalDigitsV1 that are reachable from DecimalSymbolsV2.

unicode-org / icu4x

Split numbering systems data out from decimal symbols #5818