rendello commented 2 months ago

5 is global, it should be per-table to reduce redundant searching.

rendello commented 2 months ago

The lengths derived from the current table are as follows:

Syllabic syllabic units: [9, 8, 7, 6, 5, 4, 3]
Latin syllabic units:    [5, 4, 3, 2, 1]

I think this warrants using a specific array, or at least range, for each table. I did a similar thing for my previous iteration of the project, the direct conversion tool.

See converter/src/maps.rs:

pub const KEY_LENGTHS_TO_SYL: [usize; 5] = [5, 4, 3, 2, 1];
pub const KEY_LENGTHS_TO_LAT: [usize; 4] = [6, 5, 4, 3];

And converter/src/lib.rs:

pub struct Dialect {
    maps: &'static [&'static PMap],
    key_lengths: &'static [usize],
}

rendello commented 2 months ago

In making the key length arrays static, this could prevent me from setting MAX_BYTE_LENGTH, which is used in pop_syllabic_unit() a few times as can be seen in inuklib/src/syllabic_parser.rs, an example being:

    let mut lowercase_indices = ArrayVec::<usize, {MAX_BYTE_LENGTH+1}>::new();
    for lowercase_index in CharEndIndices::new(lowercase, MAX_BYTE_LENGTH) {
        lowercase_indices.push(lowercase_index);
    }

It must be constant to set the ArrayVec length. I've had issues with that length before, which I triaged and fixed in 5c4f3a5.

I believe I see now, though, that there's no need to set this value here. The looping will be done directly from the values stored alongside the map, and the length of the ArrayVec can be any arbitrary length, as long as it's a fair bit above the maximum key size. Setting it to 100 bytes would cause no harm, for example.

Another reason this is important is, as it is, there appears to be a subtle bug not caught by the property tests. If The lowercased bytes are matched at the largest key, then the original representation is larger, it would overflow the new text_indices ArrayVec, which had been clamped at MAX_BYTE_LENGTH+1 for no good reason. Both ArrayVecs should be set to a higher number, the stack space is not so valuable.

rendello commented 1 month ago

Closed by 2cdbd24.

rendello / layout

Set `pop_syllabic_unit`'s `MAX` based on lengths in `data.rs` per table #7

5 is global, it should be per-table to reduce redundant searching.