Open hsivonen opened 2 years ago
Does the trie value look something like this?
enum TrieValueForNorm {
Zero,
One(u16),
SingleBmp(BmpChar),
DoubleBmp(BmpChar, BmpChar)
}
Discussion: the type can be
struct TrieValueForNorm(u32);
impl TrieValueForNorm {
pub fn get_foo(&self) -> BmpChar { /* can contain unsafe */ }
pub fn get_bar(&self) -> BmpChar { /* can contain unsafe */ }
}
and it has a custom ULE impl to TrieValueForNormULE
.
It's fine to convert the remaining two ZeroVecs to have deserialization-time validation, so long as it is linear-time validation.
Note: This is fine to implement because it does not change the postcard binary format; it only changes the runtime types, which we have established can be changed in minor releases.
(It was previously concluded in a meeting that it's not a post-1.0 breaking change to switch from bogus data causing GIGO to bogus data causing a constructor to err out.)
In the normalization data:
ZeroVec<'data, u16>
should use aBmpChar
type (that has the same bit representation asu16
) instead of usingu16
. TheBmpChar
type should be validated at deserialization time not to be in the surrogate range. There should be a way to get achar
out ofBmpChar
without the caller having to use anunsafe
block.ZeroVec<'data, U24>
should usechar
instead ofU24
.The
u32
trie value should be replaced with some kind of special type that has the same bit representation asu32
and that has these constraints:If the low half is 1 or 0: No further constraints. If the high half is 0: The low half is either a BMP non-surrogate value or a value between 0xD801 and 0xD8FE, inclusive. Otherwise: Both halves are BMP non-surrogate values.