rakuri255 / UltraSinger

AI based tool to convert vocals lyrics and pitch from music to autogenerate Ultrastar Deluxe, Midi and notes. It automatic tapping, adding text, pitch vocals and creates karaoke files.
MIT License
283 stars 25 forks source link

Questionable results when hyphenating #105

Open DoubleDee73 opened 11 months ago

DoubleDee73 commented 11 months ago

Sometimes the output of the automatic hyphenation leaves a bit to be desired.

Examples:

bohning commented 11 months ago

That’s why I switched to dictionary files for UltraStar Creator: https://github.com/UltraStar-Deluxe/UltraStar-Creator/tree/master/syllabification.

As a side note, we’re talking about syllabification (splitting in to singable syllables) rather than hyphenation (splitting of written words).

rakuri255 commented 10 months ago

Ok something is broken.. Thanks @DoubleDee73 for the exampels.

UltraSinger actually already uses syllables and not simple hyphenation. hyphenator.Syllables(cleaned_string) The funny thing is that it returns different results depending on the language and yet they are all wrong.

assert hyphenation("differently", Hyphenator("de_AT")) == ["dif", "fer", "ent", "ly"]
Expected :['dif', 'fer', 'ent', 'ly']
Actual :['dif', 'ferent', 'ly']
assert hyphenation("differently", Hyphenator("en_US")) == ["dif", "fer", "ent", "ly"]
Expected :['dif', 'fer', 'ent', 'ly']
Actual :['dif', 'fer', 'ently']

I need to check what the PyHyphen integration is actually doing there. It actually should use the information from LibreOffice..

@bohning thanks for the list. Will try to use it, if i cant fix PyHyphen.

@mindtakerr thanks for the info about the howmanysyllables website. This makes it easy to check and shows how syllabels are actually formed.

rakuri255 commented 10 months ago

PyHyphen uses C in the background to create syllables. It's not really written in a maintenance-friendly way. I think it makes a few mistakes.

In addition, the hyphen pattern data from LibreOffice are converted from TEX data. They also appear to be outdated.