maxrmorrison / torchcrepe

Pytorch implementation of the CREPE pitch tracker
MIT License
407 stars 63 forks source link

Retrieve timestamps #14

Closed turian closed 3 years ago

turian commented 3 years ago

How do I retrieve the timestamps of the embedding? Are they centered?

Can I assume it starts at hop_size / 2? If the audio is not divisible by hop_size, where precisely does it end?

edit: it doesn't necessarily appear to do centering, based upon the number of frames. can you confirm this is correct?

https://github.com/neuralaudio/hear-baseline/blob/main/hearbaseline/torchcrepe.py#L126

maxrmorrison commented 3 years ago

By default, the audio is padded with window_size // 2 zeros on both sides. So a signal x will produce 1 + int(len(x) // hop_size) frames. The first frame is centered on sample index 0. You can turn off the padding and use your own if you need different behavior.