huggingface / audio-transformers-course

The Hugging Face Course on Transformers for Audio
Apache License 2.0
293 stars 87 forks source link

fix a typoe #65

Open abdelkareemkobo opened 1 year ago

abdelkareemkobo commented 1 year ago

In UNIT4 : Pretrained models for audio classification We’ll load an official Audio Spectrogram Transformer checkpoint fine-tuned on the Speech Commands dataset, under the namespace "MIT/ast-finetuned-speech-commands-v2":

Copied classifier = pipeline( "audio-classification", model="MIT/ast-finetuned-speech-commands-v2" ) classifier(sample["audio"]) Fix it to be classifier(sample["audio"]["array"]) I don't know how to make a pull request yet! :)

osamja commented 6 months ago

@abdelkareemkobo Hmm, I think we may need to be careful here. I just submitted a PR for a related fix. I initially thought that the following two statements were equivalent but they are not

Screenshot 2024-01-03 at 10 07 57 PM

The above is for the minds-14 dataset, not Speech Commands but in this case classifier(sample["audio"]) produces the correct intent classification whereas adding the "array" index does not. I'm currently not able to load the speech command dataset (it hangs when trying to stream the dataset in). What is the difference in output when trying to classify these two different ways on the Speech Commands dataset?