nomadkaraoke / python-lyrics-transcriber

Automatically create synchronised lyrics files in ASS and MidiCo LRC formats with word-level timestamps, using Whisper and lyrics from Genius and Spotify, using LLMs / GPT-4 to correct transcribed lyrics
MIT License
32 stars 8 forks source link

Multilanguage support #4

Open Turtle6665 opened 12 months ago

Turtle6665 commented 12 months ago

Hello,

It would be very nice to have the ability to use this packages with non-English songs. To what I see, you are using Wisper-Timestamp witch has a multilingual support but for now, English is hard coded in the package : https://github.com/karaokenerds/python-lyrics-transcriber/blob/60d691e5c3c405a4029c998ab81431589e2ebd70/lyrics_transcriber/transcriber.py#L775

Maybe the LyricsTranscriber init could have a language argument which will be saved in the class and used in the transcribe() method ? The default value could be None and not "en" as this is the default value in Wisper-Timestamp, and it then detects the language as explained here.

At first glance, it looks like there is no another place in the code that is language related, so it might be an easy fix. Anyway, I will try to test this weekend.

UPDATE : it's not an as easy fix, the model has to change as well...

Have a nice day