huggingface / distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
MIT License
3.52k stars 280 forks source link

Errors running `run_long_form_eval.py` #87

Closed guynich closed 6 months ago

guynich commented 7 months ago

Following the training/README.md https://github.com/huggingface/distil-whisper/blob/main/training/README.md#long-form.

I have two issues running the bash script example for TED-LIUM validation set.

1) For this bash script line --dataset_config_name "all" \.

This error. ValueError: BuilderConfig 'all' not found. Available: ['default']

2) Changing bash scipt to mitigate 1) --dataset_config_name "default" \.

Then see this error.

File "/home/ubuntu/distil-whisper-large-v2-hi/run_long_form_eval.py", line 578, in eval_step
    eval_labels.append(sample["reference"][0])
KeyError: 'reference'

The model card for TED-LIUM does not mention a reference key.

sanchit-gandhi commented 6 months ago

The config name is fixed in #103, the reference error is fixed in #101!