Cut frequencies at half of the Nyquist in the MelBand model

dorpxam commented 10 months ago

Hi, before starting training on a large corpus, I've just test the latest release 3.0 and the new MelBand model. Both models are initialized with the same parameters, no change on others parameters.

model = MelBandRoformer(dim = 384,
                        depth = 9,
                        time_transformer_depth = 1,
                        freq_transformer_depth = 1, 
                        stereo=stereo, 
                        sample_rate=sr).to(device)

For in-situation condition, I load a audio stereo mixture of 8 seconds @ 44100 Hz with a simil target (a drums stem). The batch size is 1, so the tensors are of size : [1, 2, 352800]

After calling a single training forward step, the backward loss seem coherent:

BandSplit : tensor(2.3477, device='cuda:0', grad_fn=<AddBackward0>)
  MelBand : tensor(2.2762, device='cuda:0', grad_fn=<AddBackward0>)

Unfortunatly, when I save back the audio of both models outputs, I got a strange behavior in the MelBand model. Better than words, the spectrograms:

Mixture [mel scale view]

1_mixture

Target [mel scale view]

2_target

BandSplit output [mel scale view]

3_recon_audio_lin

MelBand output [mel scale view]

4_recon_audio_mel

MelBand output [linear scale view]

4_recon_audio_mel_linear

The spectrogram show that the MelBand model output cut the frequencies above 11025 Hz, so half of the Nyquist frequency of 22050 Hz for a 44100 Hz audio.

I don't know if it's normal or a bug, but I prefer to share the information here.

Thank's so much for BS-RoFormer !!!

lucidrains commented 10 months ago

hey thanks for reporting this

could you possibly try without stereo and see if the issue is there?

lucidrains commented 10 months ago

@dorpxam it was indeed a bug with the stereo! should be fixed with 0.3.1

thank you for catching this!

dorpxam commented 10 months ago

@lucidrains You're right, no problem in mono, the spectrogram show the full bandwidth. Very impressive reactivity, what a skill!

dorpxam commented 10 months ago

@lucidrains For information, just tested in stereo, works well, with a slightly better initial loss value ;)

tensor(2.1933, device='cuda:0', grad_fn=<AddBackward0>)

lyndonlauder commented 10 months ago

@dorpxam May i request for you to share your training code with me? For two weeks i have wanted to and tried to train this model but i have no luck.

dorpxam commented 10 months ago

@lyndonlauder I'm working on the training code right now. I will share the code ASAP ;)

dorpxam commented 10 months ago

@lyndonlauder I have open a discussion about training.

lucidrains / BS-RoFormer