lucidrains / BS-RoFormer

Implementation of Band Split Roformer, SOTA Attention network for music source separation out of ByteDance AI Labs
MIT License
384 stars 13 forks source link

Cut frequencies at half of the Nyquist in the MelBand model #19

Closed dorpxam closed 10 months ago

dorpxam commented 10 months ago

Hi, before starting training on a large corpus, I've just test the latest release 3.0 and the new MelBand model. Both models are initialized with the same parameters, no change on others parameters.

model = MelBandRoformer(dim = 384,
                        depth = 9,
                        time_transformer_depth = 1,
                        freq_transformer_depth = 1, 
                        stereo=stereo, 
                        sample_rate=sr).to(device)

For in-situation condition, I load a audio stereo mixture of 8 seconds @ 44100 Hz with a simil target (a drums stem). The batch size is 1, so the tensors are of size : [1, 2, 352800]

After calling a single training forward step, the backward loss seem coherent:

BandSplit : tensor(2.3477, device='cuda:0', grad_fn=<AddBackward0>)
  MelBand : tensor(2.2762, device='cuda:0', grad_fn=<AddBackward0>)

Unfortunatly, when I save back the audio of both models outputs, I got a strange behavior in the MelBand model. Better than words, the spectrograms:

Mixture [mel scale view]

1_mixture

Target [mel scale view]

2_target

BandSplit output [mel scale view]

3_recon_audio_lin

MelBand output [mel scale view]

4_recon_audio_mel

MelBand output [linear scale view]

4_recon_audio_mel_linear

The spectrogram show that the MelBand model output cut the frequencies above 11025 Hz, so half of the Nyquist frequency of 22050 Hz for a 44100 Hz audio.

I don't know if it's normal or a bug, but I prefer to share the information here.

Thank's so much for BS-RoFormer !!!

lucidrains commented 10 months ago

hey thanks for reporting this

could you possibly try without stereo and see if the issue is there?

lucidrains commented 10 months ago

@dorpxam it was indeed a bug with the stereo! should be fixed with 0.3.1

thank you for catching this!

dorpxam commented 10 months ago

@lucidrains You're right, no problem in mono, the spectrogram show the full bandwidth. Very impressive reactivity, what a skill!

dorpxam commented 10 months ago

@lucidrains For information, just tested in stereo, works well, with a slightly better initial loss value ;)

tensor(2.1933, device='cuda:0', grad_fn=<AddBackward0>)

lyndonlauder commented 10 months ago

@dorpxam May i request for you to share your training code with me? For two weeks i have wanted to and tried to train this model but i have no luck.

dorpxam commented 10 months ago

@lyndonlauder I'm working on the training code right now. I will share the code ASAP ;)

dorpxam commented 10 months ago

@lyndonlauder I have open a discussion about training.