incorrect prediction for Nsynth dataset

guozixunnicolas commented 3 years ago

Hi there,

Thanks for the re-implementation! It's really well-formatted.

I have encountered some issues regarding the prediction value. I used one sample from Nsynth dataset as the inputfile(bass_synthetic_009-025-127.wav). Check file here:https://drive.google.com/file/d/1_Ltj9Pbezx_5Ve-MLVrkF924vAfJ6j2C/view?usp=sharing

The label of the file shows it has midi pitch 25 which, after some proper calculation, is equivalent to around 34Hz.

However, when I run the algo it returns me

which seems incorrect.

I run the original crepe tf version and it returns me around 34 or 35Hz.

May I know what causes the error, or maybe the data did you train the model with didn't include music data?

Best,

Nic

maxrmorrison commented 3 years ago

You have to set the arguments to be the same as the original CREPE implementation. The following produces the desired result:

python -m torchcrepe --audio_files bass_synthetic_009-025-127.wav --output_files bass.pt --decoder argmax --fmin 0 --gpu 0

Best, Max

turian commented 10 months ago

@maxrmorrison The readme says 'weighted argmax (as in the original implementation)', not argmax. Can you clarify? Thank you.

maxrmorrison commented 10 months ago

See the README section on decoding preceding that as well as Sections II-A and IV of this paper for clarification

maxrmorrison / torchcrepe