sungnyun / avsr-temporal-dynamics

(SLT 2024) Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition
Apache License 2.0
5 stars 0 forks source link

About conf files and an error #1

Closed chaufanglin closed 8 hours ago

chaufanglin commented 10 hours ago

Hi Sungnyun,

Thanks for sharing the code. I found the conf files are not provided, do you use the same conf file as in AV-HuBERT? I tried using the avhubert conf file and changed the model name to xmodal_av_hubert_seq2seq. It showed this error:

File "avsr-temporal-dynamics/avhubert/hubert.py", line 769, in extract_finetune_features assert 'clean_audio' in source AssertionError

Many thanks, Zhaofeng

sungnyun commented 9 hours ago

Hi Zhaofeng,

Sorry for inconvenience. I just uploaded the missing config file here. Please use this one.

Thanks, Sungnyun Kim

chaufanglin commented 9 hours ago

Thank you for your quick response!

The model name in the config file didn't work initially, but I was able to resolve the issue by changing it to xmodal_av_hubert_seq2seq. I appreciate your help!

Many thanks, Zhaofeng

sungnyun commented 8 hours ago

My bad, you are right. I appreciate it!