Closed SuperiorDtj closed 4 days ago
Due to the lack of computing resources, we have only tried the fine-tuning of the Whisper Large-v2 decoder (freeze encoder), apologize for not leaving exact experimental data, but my impression is that the results of lyrics transcription are better than using Whisper Medium, but the accuracy of lyrics alignment is worse. Although not tested, I think using PEFT method such as LoRA to fine-tune larger Whisper model with limited computing resources for lyrics transcription / alignment task is a viable option, FYI.
Thank you for your suggestion! I noticed that the released checkpoint is based on the medium version of Whisper. Will you consider open-sourcing a fine-tuned version based on the larger Whisper model in the future?
Since I'm about to graduate from grad school, I can't guarantee that I'll still have enough computing resources to conduct more experiments in the future, so there are no plans to release other fine-tuned Whisper checkpoints, sorry.
Since I'm about to graduate from grad school, I can't guarantee that I'll still have enough computing resources to conduct more experiments in the future, so there are no plans to release other fine-tuned Whisper checkpoints, sorry.
Thank you for your work on the fine-tuned Whisper checkpoints! I completely understand your situation, and I appreciate the effort you've put in. Best of luck with your graduation and future endeavors!
just as the title