Closed Manishearth closed 6 days ago
Digits becomes much larger in "all" mode. I've added code for that but not hooked it in yet.
decimal/digits@1, <lookup>, 550B, 77 identifiers
decimal/digits@1, <total>, 3080B, 3420B, 77 unique payloads
@sffc In the long run CompactDecimal / etc should also be using this data. In that case, should we just always generate all known decimal systems? How would the unification work across keys?
I don't really understand the question? CompactDecimalFormatter depends on FixedDecimalFormatter and so it should already be using the new data markers.
Ah, we lose out on some of our wins when we handle the fact that the symbols data can differ for a given locale between numbering systems.
decimal/symbols@2, <lookup>, 1316B, 252 identifiers
decimal/symbols@2, <total>, 2740B, 1368B, 49 unique payloads
Fixes https://github.com/unicode-org/icu4x/issues/5818
Before:
After:
Saving ~1.5kB, a good third of the data size. A lot of the wins are just in deduplication.
I'm also going to try moving the tinystr into the VarZeroVec and seeing what happens.
I may also try and store the digits more compactly as an
enum { Sequential(char), Many(ZeroVec<char>) }
. A downside of this is that the Sequential case would need UTF8 validation every time, though we could make it so that that's just the wire format and we expand to a digit array on data load.Todo: add configurability for this.
@sffc In the long run CompactDecimal / etc should also be using this data. In that case, should we just always generate all known decimal systems? How would the unification work across keys?