Incorrect output shape on some file sizes

marl / crepe

CREPE: A Convolutional REpresentation for Pitch Estimation -- pre-trained model (ICASSP 2018)

MIT License

1.13k stars 160 forks source link

Running crepe on a 16kHz mono wav file with 102400 samples and a step size of 10ms produces 641 pitch estimates instead of 640 (both via command line and Python interface). We'd expect a hop size of 16000 / 1000 * 10 == 160 samples for a step size of 10ms. An audio clip with 102400 samples should have 102400 / 160 == 640 estimates.

You can create a synthetic audio clip to reproduce:

import crepe
import numpy as np

x = np.random.normal(size=[102400])
x = np.clip(x, -1.0, 1.0)

time, frequency, confidence, activation = crepe.predict(x, 16000, viterbi=True)
# observe len(frequency) == 641

marl / crepe

Incorrect output shape on some file sizes #68