Closed rendello closed 1 month ago
The lengths derived from the current table are as follows:
Syllabic syllabic units: [9, 8, 7, 6, 5, 4, 3]
Latin syllabic units: [5, 4, 3, 2, 1]
I think this warrants using a specific array, or at least range, for each table. I did a similar thing for my previous iteration of the project, the direct conversion tool.
pub const KEY_LENGTHS_TO_SYL: [usize; 5] = [5, 4, 3, 2, 1];
pub const KEY_LENGTHS_TO_LAT: [usize; 4] = [6, 5, 4, 3];
And converter/src/lib.rs:
pub struct Dialect {
maps: &'static [&'static PMap],
key_lengths: &'static [usize],
}
In making the key length arrays static, this could prevent me from setting MAX_BYTE_LENGTH
, which is used in pop_syllabic_unit()
a few times as can be seen in inuklib/src/syllabic_parser.rs, an example being:
let mut lowercase_indices = ArrayVec::<usize, {MAX_BYTE_LENGTH+1}>::new();
for lowercase_index in CharEndIndices::new(lowercase, MAX_BYTE_LENGTH) {
lowercase_indices.push(lowercase_index);
}
It must be constant to set the ArrayVec
length. I've had issues with that length before, which I triaged and fixed in 5c4f3a5.
I believe I see now, though, that there's no need to set this value here. The looping will be done directly from the values stored alongside the map, and the length of the ArrayVec
can be any arbitrary length, as long as it's a fair bit above the maximum key size. Setting it to 100 bytes would cause no harm, for example.
Another reason this is important is, as it is, there appears to be a subtle bug not caught by the property tests. If The lowercased bytes are matched at the largest key, then the original representation is larger, it would overflow the new text_indices
ArrayVec
, which had been clamped at MAX_BYTE_LENGTH+1
for no good reason. Both ArrayVec
s should be set to a higher number, the stack space is not so valuable.
Closed by 2cdbd24.
5 is global, it should be per-table to reduce redundant searching.