Open martino-vic opened 2 years ago
I think in this case, you'd need to switch to using per-language orthography profiles. See https://github.com/lexibank/pylexibank/blob/8ae170cecb67f450b7a8cbaa56ded94281944b0f/src/pylexibank/dataset.py#L138-L148 for details.
Yes, @martino-vic, we have enough example cases for these profiles. As a workaround, you can also list the 13 cases in your code and do their segmentation directly. Just add a dictionary with language and value and I later check how one could do the segmentation here as a workaround.
In my orthography.tsv there are now 13 words that are spelled the same way but should be transcribed differently, e.g.
which leads to
WARNING:segments.profile:line 21:duplicate grapheme in profile: alma
when I run the lexibank script.