Open hsivonen opened 6 months ago
Some thoughts:
enum { DefaultValue, ErrorValue, Index(usize) }
and completely remove the type parameter from CodePointTrie, similar to how ZeroTrie worksI'll put this in the 2.0 milestone, but it isn't super-high priority and it could slip to 3.0.
The trie builder always operates on 32-bit values and can then narrow the main backing array value to 8 or 16 bits at serialization time.
We already use a byte array as unaligned backing storage. We should consider extending the way the reads by index map to the backing byte array a little to support more compact value widths:
If the byte array had one extra byte at the end, we could use 32-bit unaligned loads to read 24-bit values (masking off the highest 8 bits) without going out of bounds. See also #4669.
For 1, 2, and 4-bit values, we could shift and mask the index to read smaller parts of bytes from an array that was 1/8, 1/4, or 1/2 in byte length compared to using 8 bits as the narrowest value.
1 bits is useful for accessing a binary property faster than from a fragmented inversion list. 2 bits is useful for bundling two co-occurring binary properties. 4 bits is useful for enumerated properties with few distinct values, e.g.
Joining_Type
. 24 bits is useful for scalar values.