Running crepe on a 16kHz mono wav file with 102400 samples and a step size of 10ms produces 641 pitch estimates instead of 640 (both via command line and Python interface). We'd expect a hop size of 16000 / 1000 * 10 == 160 samples for a step size of 10ms. An audio clip with 102400 samples should have 102400 / 160 == 640 estimates.
You can create a synthetic audio clip to reproduce:
import crepe
import numpy as np
x = np.random.normal(size=[102400])
x = np.clip(x, -1.0, 1.0)
time, frequency, confidence, activation = crepe.predict(x, 16000, viterbi=True)
# observe len(frequency) == 641
Hmm, looks like this is an artifact of the convention used in many libraries when center=True (e.g. librosa, PyTorch). Can't fault crepe for following convention.
Running crepe on a 16kHz mono wav file with 102400 samples and a step size of 10ms produces 641 pitch estimates instead of 640 (both via command line and Python interface). We'd expect a hop size of
16000 / 1000 * 10 == 160
samples for a step size of 10ms. An audio clip with 102400 samples should have102400 / 160 == 640
estimates.You can create a synthetic audio clip to reproduce: