The audio_label.reshape(-1, audio_label.shape[-1])[:, :-1] part discards the last audio sample point.
May I know the reason of putting [:, :-1] to discard the last audio sample point?
What happens if we keep the complete audio? (Not discarding the last sample point)
That was just a hack to avoid an off-by-one error. If you don't discard the last sample, the number of frames in the Mel spectrograms will sometimes be different from the number of frames in the label.
In your transcriber.py, line 102, you obtain the melspectrogram by using
The
audio_label.reshape(-1, audio_label.shape[-1])[:, :-1]
part discards the last audio sample point. May I know the reason of putting [:, :-1] to discard the last audio sample point?What happens if we keep the complete audio? (Not discarding the last sample point)