rakuri255 / UltraSinger

AI based tool to convert vocals lyrics and pitch from music to autogenerate Ultrastar Deluxe, Midi and notes. It automatic tapping, adding text, pitch vocals and creates karaoke files.
MIT License
254 stars 23 forks source link

Skipping Word <digit>, because of missing timings #82

Open rogerGunis opened 1 year ago

rogerGunis commented 1 year ago

Hello,

thanks for sharing this software - I have a problem with german songs:

Error: Skipping Word 1000, because of missing timings Error: Skipping Word 300.000, because of missing timings Error: Skipping Word 4, because of missing timings Error: Skipping Word 5, because of missing timings ....

My cli: py UltraSinger.py -i "C:......mp3" --whisper large-v2 --force_whisper_cpu --force_separation_cpu

Can anybody help me with that. Are there better models for german songs?

Thank you Kind Regards Roger

rakuri255 commented 1 year ago

Hi,

can you provide a YouTube link for testing? I will later try to make a better Error message that actually show the word 😅 Maybe whisper found some words in noise which are extrem short.

I found --whisper large-v2 for German quite good so far. But you can try others on Huggingface. Maybe there are better ones. Please post them here: Discussion

rogerGunis commented 1 year ago

Hi, thank you for the fast response. So far the translation is fine but numbers won't work. For example

https://m.youtube.com/watch?v=uC08L4xxjNM Or Nena 99 Red Balloons

Other Huggingface models seem to be problematic in post processing. The words are not placed accordingly to the tones. Verly long sentences are shown in ultrastar.

Kind Regards and thank you for your work Roger

rakuri255 commented 1 year ago

hmm nothing i can do much. It's an issue from WhisperX. I opened an issue #409. Maybe you can add some more info there.

I made a workaround. Now the word is not simply removed but added after the previous one with 0.2 seconds length. You have to adjust it manually.

rakuri255 commented 2 months ago

The PR #135 handles the numbers by translating it to english words. The problem here is that it will probebly only work for english words.