pluots / zspell

A spellchecking library and executable written in Rust
Other
45 stars 4 forks source link

Morph info isn't specific enough #73

Closed tgross35 closed 1 year ago

tgross35 commented 1 year ago

For example, with the following dictionary:

==== afx ====
FLAG num
SFX 10 Y 3
SFX 10 0 0 . is:tens
SFX 10 0 00 . is:hundreds
SFX 10 0 000 . is:thousands

==== dic ====
10
0 po:number
1/10 po:number
2/10 po:number
3/10 po:number
4/10 po:number
5/10 po:number
6/10 po:number
7/10 po:number
8/10 po:number
9/10 po:number

That produces the following:

[zspell/src/dict.rs:623] meta = Meta {
    stem: "9",
    source: Affix(
        AfxRule {
            kind: Suffix,
            can_combine: true,
            patterns: [
                AfxRulePattern {
                    affix: "0",
                    condition: None,
                    strip: None,
                    morph_info: [
                        InflecSfx(
                            "tens",
                        ),
                    ],
                },
                AfxRulePattern {
                    affix: "00",
                    condition: None,
                    strip: None,
                    morph_info: [
                        InflecSfx(
                            "hundreds",
                        ),
                    ],
                },
                AfxRulePattern {
                    affix: "000",
                    condition: None,
                    strip: None,
                    morph_info: [
                        InflecSfx(
                            "thousands",
                        ),
                    ],
                },
            ],
        },
    ),
}

And .analyze() returns all three inflection suffixes when it should just return one.

Maybe we need some way to store the specific rule? E.g. Arc<(Affix, usize)>` to store the overall rule as well as the specific index?

Or maybe we add a Weak<AfxRule> to AfxRulePattern, and store the patterns rather than the rules under Affix.

Need to be careful of cyclic references