marl / crepe

CREPE: A Convolutional REpresentation for Pitch Estimation -- pre-trained model (ICASSP 2018)
https://marl.github.io/crepe/
MIT License
1.13k stars 160 forks source link

Incorrect output shape on some file sizes #68

Closed sharvil closed 4 years ago

sharvil commented 4 years ago

Running crepe on a 16kHz mono wav file with 102400 samples and a step size of 10ms produces 641 pitch estimates instead of 640 (both via command line and Python interface). We'd expect a hop size of 16000 / 1000 * 10 == 160 samples for a step size of 10ms. An audio clip with 102400 samples should have 102400 / 160 == 640 estimates.

You can create a synthetic audio clip to reproduce:

import crepe
import numpy as np

x = np.random.normal(size=[102400])
x = np.clip(x, -1.0, 1.0)

time, frequency, confidence, activation = crepe.predict(x, 16000, viterbi=True)
# observe len(frequency) == 641
sharvil commented 4 years ago

Hmm, looks like this is an artifact of the convention used in many libraries when center=True (e.g. librosa, PyTorch). Can't fault crepe for following convention.