jasonppy / PromptingWhisper

Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation
132 stars 11 forks source link

Bug? #6

Closed JoanZhou closed 7 months ago

JoanZhou commented 8 months ago

I run the bash visspeech.sh, got error as below. Have anyone met this before?

currently testing medium.en pk 0 ok 50 Namespace(seed=1, num_workers=8, data_split='dev', batch_size=32, sample_rate=16000, audio_max_length=480000, text_max_length=120, padding_idx=-100, model='medium.en', whisper_root='/saltpool0/scratch/pyp/whisper/pretrained_models', dataset='visspeech', dataset_dir='path/to/visspeech/data', core_metric='wer', task='transcribe', topk=600, beam_size=5, block_ngrams=[], language='en', code_switching='0', single_lang_threshold=0.8, concat_lang_token=0, logit_mask='0', vocab_cap=0.7, socratic='1', num_img=3, place_topk=0, obj_topk=50, object_txt_fn='path/to/place_and_object/dictionary_and_semantic_hierarchy.txt', place_txt_fn='path/to/place_and_object/categories_places365.txt', object_pkl_fn='path/to/place_and_object/tencent_336.pkl', place_pkl_fn='/data/scratch/pyp/exp_pyp/whisper/place_and_object/places365_336.pkl') clip_Model parameters (total): 427944193 clip_Model parameters (image encoder): 304293888 clip_Model parameters (text encoder): 122999808 Input image resolution: 336 Context length: 77 Vocab size: 49408 embed places365 text Traceback (most recent call last): File "/data2/zhouan/amlt/PromptingWhisper/scripts/../avsr.py", line 131, in for place in place_categories[:, 0]: IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

jasonppy commented 8 months ago

Thanks for the issue!

This is a processing bug and has been fixed in commit d562449