jongwook / onsets-and-frames

A Pytorch implementation of Onsets and Frames (Hawthorne 2018)
MIT License
209 stars 66 forks source link

Discarding the last sample point from the audio? #12

Closed KinWaiCheuk closed 4 years ago

KinWaiCheuk commented 4 years ago

In your transcriber.py, line 102, you obtain the melspectrogram by using

mel = melspectrogram(audio_label.reshape(-1, audio_label.shape[-1])[:, :-1]).transpose(-1, -2)

The audio_label.reshape(-1, audio_label.shape[-1])[:, :-1] part discards the last audio sample point. May I know the reason of putting [:, :-1] to discard the last audio sample point?

What happens if we keep the complete audio? (Not discarding the last sample point)

jongwook commented 4 years ago

That was just a hack to avoid an off-by-one error. If you don't discard the last sample, the number of frames in the Mel spectrograms will sometimes be different from the number of frames in the label.