Closed melihogutcen closed 1 year ago
@ArthurZucker
You can add generate_kwargs = {"language":"<|tr|>","task": "transcribe"},
to your pipeline initialization and it should work.
Updated the notebook with the following new line :
pipe(speech_file, generate_kwargs = {"task":"transcribe", "language":"<|fr|>"} )
Voila! I am able to set the language by using generate_kwargs = {"language":"<|tr|>","task": "transcribe"}
in pipeline initialization. Thanks.
Hello, I got same problem. But generate_kwargs = {"language":"<|tr|>","task": "transcribe"}
is not work for me.
ValueError: The following `model_kwargs` are not used by the model: ['task', 'language'] (note: typos in the generate arguments will also show up in this list)
Here is the code:
from transformers import WhisperProcessor,WhisperForConditionalGeneration
import whisper
from transformers import pipeline
model = WhisperForConditionalGeneration.from_pretrained("./whisper_tiny_pytorch_model.bin",config="./config.json").to("cuda:0")
processor = WhisperProcessor.from_pretrained("./")
audio = whisper.load_audio("./a.flac")
i = processor(audio,return_tensors="pt").input_features.to("cuda:0")
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
chunk_length_s=30,
device="cuda:0",
)
r = pipe(av, generate_kwargs = {"task":"transcribe", "language":"japanese"})
Could you help me?
Env: pytorch==2.1.0.dev20230302+cu117 transformer==4.26.1 whisper model is download on huggingface.
Hey @AnestLarry, the language tag that you are using is wrong!
As you can see in the generation_config.json
, the lang_to_id
defines the mapping from language token to the actual input ids.
What you should be using (and there is an example of this in the notebook here ) is the following:
...
pipe( av, generate_kwargs = {"language"= "<|ja|>"}
Hey @ArthurZucker ,
r = pipe(audio, generate_kwargs = {"language":"<|ja|>"})
ValueError: The following `model_kwargs` are not used by the model: ['language'] (note: typos in the generate arguments will also show up in this list)
I still got the same error. When I using {"language": "<|ja|>"}
to get_decoder_prompt_ids
(in a way direct to using model generate), I got a error tips to change my arg.
processor.get_decoder_prompt_ids(language="<|ja|>",task="transcribe")
ValueError: Unsupported language: <|ja|>. Language should be one of: ['english', 'chinese', 'german', 'spanish', 'russian', 'korean', 'french', 'japanese', 'portuguese', 'turkish', 'polish', 'catalan', 'dutch', 'arabic', 'swedish', 'italian', 'indonesian', 'hindi', 'finnish', 'vietnamese', 'hebrew', 'ukrainian', 'greek', 'malay', 'czech', 'romanian', 'danish', 'hungarian', 'tamil', 'norwegian', 'thai', 'urdu', 'croatian', 'bulgarian', 'lithuanian', 'latin', 'maori', 'malayalam', 'welsh', 'slovak', 'telugu', 'persian', 'latvian', 'bengali', 'serbian', 'azerbaijani', 'slovenian', 'kannada', 'estonian', 'macedonian', 'breton', 'basque', 'icelandic', 'armenian', 'nepali', 'mongolian', 'bosnian', 'kazakh', 'albanian', 'swahili', 'galician', 'marathi', 'punjabi', 'sinhala', 'khmer', 'shona', 'yoruba', 'somali', 'afrikaans', 'occitan', 'georgian', 'belarusian', 'tajik', 'sindhi', 'gujarati', 'amharic', 'yiddish', 'lao', 'uzbek', 'faroese', 'haitian creole', 'pashto', 'turkmen', 'nynorsk', 'maltese', 'sanskrit', 'luxembourgish', 'myanmar', 'tibetan', 'tagalog', 'malagasy', 'assamese', 'tatar', 'hawaiian', 'lingala', 'hausa', 'bashkir', 'javanese', 'sundanese', 'burmese', 'valencian', 'flemish', 'haitian', 'letzeburgesch', 'pushto', 'panjabi', 'moldavian', 'moldovan', 'sinhalese', 'castilian'].
And I can get valid result with model generate.
forced_decoder_ids = processor.get_decoder_prompt_ids(language="japanese",task="transcribe")
r = model.generate(i,forced_decoder_ids = forced_decoder_ids)
out: ['<|startoftranscript|><|ja|><|transcribe|><|notimestamps|>夜が開き出し...<|endoftext|>']
Sorry I guess I should have been clearer:
pipe( av, generate_kwargs = {"language"= "<|ja|>", "task"="transcribe"}
(I was just sharing how to fix the language)
Moreover, this is not on the latest release, as the notebook mentions you have to use the main
branch
Thank you for notion me the version problem ignored by me. I had run success (without error message) after install main
branch. But fix the language
still not work.
model = WhisperForConditionalGeneration.from_pretrained("./whisper_tiny_pytorch_model.bin",config="./config.json").to("cuda:0")
processor = WhisperProcessor.from_pretrained("./")
audio = whisper.load_audio("./a.mp3")
i = processor(audio,return_tensors="pt").input_features.to("cuda:0")
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
chunk_length_s=30,
device="cuda:0",
)
r = pipe(audio, generate_kwargs = {"language":"<|ja|>","task":"transcribe"})
{'text': " I'm not going bit ...}
I fixed ja
and got a English result. (audio
is a japanese song.
Is the code wrong though?
Try using the notebook I provided, your custom model might not be working and I can't debug it for you 😅
Could you try using the openai/whisper-small
model as shown in the notbook? Then you can compare the configuration file and generation config
Very thank you. My model is download from huggingface without change anything from me. Just used openai/whisper
to successfully complete the task. And I found that model file name look like effect the result. 😅
Change model file name whisper_tiny_pytorch_model.bin
to pytorch_model.bin
, and no problem now.
Great that you no longer have an issue! Thanks for bearing with me 🤗
When I am installing the newest Transformers, I am now getting the following error setting language in the pipeline:
File "/Users/me/miniconda3/envs/torch-gpu/lib/python3.10/site-packages/transformers/models/whisper/modeling_whisper.py", line 1570, in generate
if generation_config.language in generation_config.lang_to_id.keys():
AttributeError: 'GenerationConfig' object has no attribute 'lang_to_id'
I had this same issue with our finetuned whisper-large-rixvox @peregilk .
I think what happens is that finetuned Whisper models typically are already configured to predict a specific language during finetuning. When the people who train these models save a checkpoint, there is no "GenerationConfig" generated, as the model is still hardcoded to predict a specific language.
E.g. see generation_config.json from OpenAI/whisper-large-v2 and compare against a finetuned version of whisper where generation_config.json is missing.
If the person who trains a finetuned whisper follows Huggingface's finetuning instructions, there will be no GenerationConfig for the model.
Perhaps there should be a better error message for this @ArthurZucker .
The solution is simply to not specify generate_kwargs
at all for any finetuned model where generation_config.json
is missing. The finetuned model will predict in the language it was finetuned on without the generate_kwargs
.
Thanks for reporting @peregilk and @Lauler! This is probably quite a good fix right @ArthurZucker? We don't use any of the generation_config
logic unless generation_config.json
is present on the Hub?
I believe the current workaround is to update the generation config according to this comment: https://github.com/huggingface/transformers/issues/21878#issuecomment-1451902363
This should fix both issues described above. It's cumbersome though and ideally we'd have a way of handling it in transformers!
Detecting language using up to the first 30 seconds. Use --language
to specify the language
Detected language: Javanese
Hello, i'm using whisper to translate. how to change the detected langunge? what is the code? thanks in advance
@ArthurZucker @sanchit-gandhi thanks, this worked, but I would expect that model.config.suppress_tokens = [50290]
would work as well (50290 corresponds to the index of "<|ur|>". I wanted to supperess urdu) if I do not want to use pipeline but I still get the transcription in urdu. But in this case, what worked for me was model.config.forced_decoder_ids = processor.tokenizer.get_decoder_prompt_ids(language="english", task="transcribe")
. Just curious what is going on behind the scene. Thanks
Hey @kamalojasv181 - could you try updating the generation_config
, since it receives priority over the config:
model.generation_config.suppress_tokens.append(50290)
=> this should set the probability of the <|ur|>
to zero during generation.
The recommended API is now to pass language=..., task=...
directly to generate. This takes precedence over all generation config / config attributes, and is far easier to set: https://huggingface.co/docs/transformers/model_doc/whisper#transformers.WhisperForConditionalGeneration.generate.language
E.g. see how we set the language="french"
and task="transcribe"
for this French speech transcription example:
from transformers import WhisperProcessor, WhisperForConditionalGeneration
from datasets import Audio, load_dataset
# load model and processor
processor = WhisperProcessor.from_pretrained("openai/whisper-large-v2")
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large-v2")
# load streaming dataset and read first audio sample
ds = load_dataset("common_voice", "fr", split="test", streaming=True)
ds = ds.cast_column("audio", Audio(sampling_rate=16_000))
input_speech = next(iter(ds))["audio"]
# pre-process audio sample to log-mel spectrogram
input_features = processor(input_speech["array"], sampling_rate=input_speech["sampling_rate"], return_tensors="pt").input_features
# generate token ids
predicted_ids = model.generate(input_features, language="french", task="transcribe")
# decode token ids to text
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription)
This does the same thing as the forced decoder ids under the hood, setting the task/language token for Whisper: https://huggingface.co/openai/whisper-large-v2#usage
Thanks
This help me just now
Problem
Hello,
I followed this notebook for Whisper pipelines. https://colab.research.google.com/drive/1rS1L4YSJqKUH_3YxIQHBI982zso23wor?usp=sharing#scrollTo=Ca4YYdtATxzo
Here, I want to use speech transcription with openai/whisper-large-v2 model using the pipeline. By using WhisperProcessor, we can set the language, but this has a disadvantage for longer audio files than 30 seconds. I used the below code and I can set the language here.
Long audio files can be processed in the pipeline by setting chunk_length as below. But in the pipeline, I couldn't set the language. Therefore, I have gotten English results in my Turkish speech data.
Is there a way to set the language?
System Info
docker image:
Transformers Version:
transformers==v4.27dev
Who can help?
@sanchit-gandhi @Narsil
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior