seanpue / graphtransliterator

A graph-based transliteration tool
MIT License
8 stars 3 forks source link

Count substitutions #124

Open argideritzalpea opened 4 years ago

argideritzalpea commented 4 years ago

This is a cool tool, thanks for open sourcing this.

Is there any way to obtain a count of the number of substitutions made for each of the rules?

seanpue commented 4 years ago

HI @argideritzalpea GraphTransliterator tokenizes the input string and then matches a list of tokens (allowing for checking before and behind by specific tokens or token classes), picking the rule that is most specific at a particular index and then advancing the pointer however many tokens are in the rule.

The rules are available via rules in the transliterator as a TransliterationRule class: https://graphtransliterator.readthedocs.io/en/latest/api.html#rule-classes So you can just check the length of the tokens of the rule, if that's what you're after. If not, let me know!