microsoft / Pengi

An Audio Language model for Audio Tasks
https://arxiv.org/abs/2305.11834
MIT License
281 stars 15 forks source link

Information about Evaluation #14

Closed asif-hanif closed 1 month ago

asif-hanif commented 4 months ago

Hi, Thanks for the great work. I have a question regarding the evaluation on US8K dataset. This dataset has 10 folds and its website recommends using 10-fold cross validation to obtain average test results. Could you please confirm if you used 10-fold cross validation. I have evaluated pengi model on each fold separately and average accuracy across these folds do not match with the number reported in paper (i.e. Accuracy=0.7185 on US8K from Table 3). I get average accuracy around 0.55.

soham97 commented 2 months ago

Hi @asif-hanif, make sure you are resampling the dataset to 44.1 kHz. The model performance drops when a different sampling rate is used.