Closed LYPinASR closed 1 year ago
Hi there. Questions like this are better suited on the forums or a discussion on the model page as we keep issues for bugs and feature requests only.
If you use pipeline, you should add option like generate_kwargs = {"task":"transcribe", "language":"<|fr|>"}
ref1: https://colab.research.google.com/drive/1rS1L4YSJqKUH_3YxIQHBI982zso23wor#scrollTo=dPD20IkEDsbG ref2: https://github.com/huggingface/transformers/issues/22331
however, I think default task should be "transcribe" not "translate". I insist It's an error.
I have solved the problem. Step 1: Upgrade transformers, unfixed. Step 2: Add option like "generate_kwargs = {"task":"transcribe", "language":"<|fr|>"}", unfixed. Step 3: Add a line like "pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(language="ko", task="transcribe")", fixed.
However, I still don't understand why the original model output is English but the fine-tuned model output is in Korean.
maybe you can checked your fine-tuned model's config.json or generation_config.json, double check the default task type, I think it's null or "transcribe"
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Feature request
When I follow the example of long-form transcription for whisper-large with Korean, the result is English. But after finetuning the whisper-large model with some Korean data, the checkpoint can output Korean. I also test other model size, but all the models output English. I was confused about it. How should I do to output Korean with the original model? Thank you!
Motivation
Test whisper in Korean.
Your contribution
Test whisper in Korean.