Open DoubleDee73 opened 11 months ago
That’s why I switched to dictionary files for UltraStar Creator: https://github.com/UltraStar-Deluxe/UltraStar-Creator/tree/master/syllabification.
As a side note, we’re talking about syllabification (splitting in to singable syllables) rather than hyphenation (splitting of written words).
Ok something is broken.. Thanks @DoubleDee73 for the exampels.
UltraSinger actually already uses syllables and not simple hyphenation. hyphenator.Syllables(cleaned_string)
The funny thing is that it returns different results depending on the language and yet they are all wrong.
assert hyphenation("differently", Hyphenator("de_AT")) == ["dif", "fer", "ent", "ly"]
Expected :['dif', 'fer', 'ent', 'ly']
Actual :['dif', 'ferent', 'ly']
assert hyphenation("differently", Hyphenator("en_US")) == ["dif", "fer", "ent", "ly"]
Expected :['dif', 'fer', 'ent', 'ly']
Actual :['dif', 'fer', 'ently']
I need to check what the PyHyphen integration is actually doing there. It actually should use the information from LibreOffice..
@bohning thanks for the list. Will try to use it, if i cant fix PyHyphen.
@mindtakerr thanks for the info about the howmanysyllables website. This makes it easy to check and shows how syllabels are actually formed.
PyHyphen uses C in the background to create syllables. It's not really written in a maintenance-friendly way. I think it makes a few mistakes.
In addition, the hyphen pattern data from LibreOffice are converted from TEX data. They also appear to be outdated.
Sometimes the output of the automatic hyphenation leaves a bit to be desired.
Examples: