jasonppy / PromptingWhisper

Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation
133 stars 11 forks source link

BUG ? #5

Closed MM-0712 closed 9 months ago

MM-0712 commented 9 months ago

Hi, thanks for your contributions. I run your code with CS ASR mode in ASCEND dataset, but find the two bugs: 1: in whisper/decoding.py#L289 list will be to List ? https://github.com/jasonppy/PromptingWhisper/blob/2e1c5d3fde4eda380d38afd71b0a5e09e01c114d/whisper/decoding.py#L289 2: seems your gettokens function error? I fix the bug-1, but I get another bug, " , probs = whisper.detect_language(model, input_mel) raise ValueError(f"This model doesn't have language tokens so it can't perform lang id")". I print the key info, the tokenizer.language_token not in tokenizer.sot_sequence is true ?

@jasonppy Would you help to fix this ? Thanks in advance.

Best wishes Ma

jasonppy commented 9 months ago

could you paste the entire error log?

MM-0712 commented 9 months ago

Thanks for your response @jasonppy , the entire error log is below: Traceback (most recent call last): File "../csasrst.py", line 144, in , probs = whisper.detect_language(model, input_mel) File "/app/espnet/tools/miniconda/envs/pw/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/data/code/PromptingWhisper/whisper/decoding.py", line 39, in detect_language raise ValueError(f"This model doesn't have language tokens so it can't perform lang id") ValueError: This model doesn't have language tokens so it can't perform lang id

MM-0712 commented 9 months ago

@jasonppy I run on ASCEND and follow your script/ascend.sh.

dataset="ascend" model="tiny" dataset_dir="my dataset path" split="dev" core_metric="mer" single_lang_threshold=0.9 concat_lang_token=1 code_switching="zh-en"

MM-0712 commented 9 months ago

@jasonppy I have replace the tiny model with base, but also get the same error. I debug your code find that it seems your get_token fuction have bug. Specifically, in code: https://github.com/jasonppy/PromptingWhisper/blob/2e1c5d3fde4eda380d38afd71b0a5e09e01c114d/whisper/decoding.py#L34C8-L34C8

tokenizer.language_token not in tokenizer.sot_sequence is True. So I guess your get_token related function have bug.

MM-0712 commented 9 months ago

@jasonppy Based on your excellent work, I have another idea. So I want to reproduce your CS ASR results shown on your paper. Could you help me to fix this?

best wishes Ma

jasonppy commented 9 months ago

Thanks for your response @jasonppy , the entire error log is below: Traceback (most recent call last): File "../csasrst.py", line 144, in , probs = whisper.detect_language(model, input_mel) File "/app/espnet/tools/miniconda/envs/pw/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/data/code/PromptingWhisper/whisper/decoding.py", line 39, in detect_language raise ValueError(f"This model doesn't have language tokens so it can't perform lang id") ValueError: This model doesn't have language tokens so it can't perform lang id

Thanks for the error log.

I just re-ran the code with your params and it finished successfully. I notice that the line number of the content in your error log doesn't match the codebase,

_, probs = whisper.detect_language(model, input_mel) should be line 145 in csasr_st.py (while it's line 144 in your case), and raise ValueError(f"This model doesn't have language tokens so it can't perform lang id") should be line 35 (while it's line 39 in your log)

Is it possible that you make some unwanted changes to the code? Could you trying running the original codebase and see if the same error occurs?

If it's hard for you to run the original code, I could share some advise on how to debug your code:

from the log, executing line 144 in ../scasr_st.py brought you to line 39 in decoding.py, which implies that maybe the tokenizer is not properly set up, you could check if tokenizer.language, tokenizer.language_token and tokenizer.sot_sequence are normal, and then maybe take a look at ./whisper/tokenizer.py line 296

MM-0712 commented 9 months ago

@jasonppy Thanks for your detailed response. I only modify the code style. I re-git pull your code, but still get the two bugs: 1: in whisper/decoding.py#L289 list will be to List ? https://github.com/jasonppy/PromptingWhisper/blob/2e1c5d3fde4eda380d38afd71b0a5e09e01c114d/whisper/decoding.py#L289 where block_ngrams: list[int]=[] is should be block_ngrams: List[int]=[] 2: I fix the bug-1 and get " _, probs = whisper.detect_language(model, input_mel) raise ValueError(f"This model doesn't have language tokens so it can't perform lang id")" I have some doubts that you haven't updated the code you can run on this repo.

Yet, thank your very much for your response. I will learn the raw whisper code and re-code your idea. I hope I can reproduce your paper results.

Best wishes

jasonppy commented 9 months ago

Regarding the bug 1, where you need to change list[int] to List[int]: list works for python 3.9 and above, while otherwise you might need to first from typing import List and then use List. Please note that in README.md we recommended using python=3.9.16 as that's what we used in our experiment. From your error log, it seems that you are using python 3.8

Regarding bug 2, I checked again and I'm sure that this github repo is identical to my local repo. The bug is strange, and as I mentioned in my previous reply, you could try printing out those items to see if anyone of them has strange values and go from there. I hope this issue could be resolved soon.

Let me know if you need more help!