Open tbm opened 1 year ago
There is "Zivildienstleistender" and "Zivildienst Leistender". Each has one word separation.
@tbm Thank you for reporting this!
I didn't know that this problem existed. It is indeed not great.
As for how to handle this:
So my idea is:
['Zi', 'vil', 'dienst', 'leis', 'ten', 'der']
.word_separation_variants
which would return all options split.This way, a kind of backward compatibility would be preserved, but also different variants could be read.
I see that the .name
property is handled fine
In [3]: w.name
Out[3]: 'Zivildienstleistender'
but the other variant "Zivildienst Leistender" cannot be read. Maybe this can be another property e.g. .name_variants
This sounds like it would not be hard to implement.
I'm not so good at designing good APIs. However, your suggestion to leave the current field as is with the first option and then another another option with all options sounds good to me.
Some words have multiple word separation options, e.g. https://www.duden.de/rechtschreibung/Zivildienstleistender has:
This isn't handled great: