Open ZFTurbo opened 7 months ago
I tried your training pipeline, and by the second epoch, the SDR for vocals had already reached 6.1, so could there be some configuration error?
Could you post your config may be? What did you use for training and for validation?
Using 4 GPUs, train and test on musdb18
audio:
chunk_size: 441000
num_channels: 2
sample_rate: 44100
min_mean_abs: 0.001
model:
sources: ['drums', 'bass', 'other', 'vocals']
audio_channels: 2
dims: [4, 32, 64, 128] #[4, 64, 128, 256] in SCNet-large
nfft: 4096
hop_size: 1024
win_size: 4096
normalized: True
band_configs: {
'low': { 'SR': .175, 'stride': 1, 'kernel': 3 },
'mid': { 'SR': .392, 'stride': 4, 'kernel': 4 },
'high': {'SR': .433, 'stride': 16, 'kernel': 16 }
}
#SR=[.225, .372, .403] in SCNet-large
conv_depths: [3,2,1]
compress: 4
conv_kernel: 3
num_dplayer: 6
expand: 1
training:
batch_size: 8
Earlier I trained only ['vocals', 'other'] model with my vocals dataset. Now I switched on MUSDB18 dataset with 4 stems. And it's also not great - SDR for vocals less than 2 after 2 epochs. What's the settings for training did you use in config? I train with less batch size on 3 x 48 GB cards. But I don't think it can have such big impact.
My current config:
audio:
chunk_size: 441000
num_channels: 2
sample_rate: 44100
min_mean_abs: 0.000
model:
sources: ['drums', 'bass', 'other', 'vocals']
audio_channels: 2
dims: [4, 32, 64, 128] #[4, 64, 128, 256] in SCNet-large
nfft: 4096
hop_size: 1024
win_size: 4096
normalized: True
band_configs: {
'low': { 'SR': .175, 'stride': 1, 'kernel': 3 },
'mid': { 'SR': .392, 'stride': 4, 'kernel': 4 },
'high': { 'SR': .433, 'stride': 16, 'kernel': 16 }
}
# SR=[.225, .372, .403] in SCNet-large
conv_depths: [3, 2, 1]
compress: 4
conv_kernel: 3
# Dual-path RNN
num_dplayer: 6
expand: 1
# mamba
use_mamba: False
mamba_config: {
'd_stat': 16,
'd_conv': 4,
'd_expand': 2
}
training:
batch_size: 4
gradient_accumulation_steps: 1
grad_clip: 0
instruments:
- drums
- bass
- other
- vocals
lr: 4.0e-05
patience: 2
reduce_factor: 0.95
target_instrument: null
num_epochs: 1000
num_steps: 1000
q: 0.95
coarse_loss_clip: true
ema_momentum: 0.999
optimizer: adam
other_fix: false # it's needed for checking on multisong dataset if other is actually instrumental
use_amp: true # enable or disable usage of mixed precision (float16) - usually it must be true
This is my config, with different lr and batch_size. I fully adopted your configuration (3 GPUs, batch=4, lr=4.0e-5), and after the second epoch, the SDR for vocal was 2.6303. Additionally, with batch_size of 4, it only used about 10GB of GPU memory, may be could try a larger batch size?
batch_size: 8
gradient_accumulation_steps: 1
grad_clip: 0
instruments:
- drums
- bass
- other
- vocals
lr: 5.0e-04
patience: 2
reduce_factor: 0.95
target_instrument: null
num_epochs: 1000
num_steps: 1000
q: 0.95
coarse_loss_clip: true
ema_momentum: 0.999
optimizer: adam
other_fix: false # it's needed for checking on multisong dataset if other is actually instrumental
use_amp: true # enable or disable usage of mixed precision (float16) - usually it must be true
With batch_size: 6 - I have out of memory. With batch size: 5 - very big slow down (I see it copy data all the time). 4 - is maximum optimal.
I still can't make it work, normally. Is it possible you train some MUSDB18 checkpoint at your side and put it somewhere so we can start from this as pre-train after?
Hi there,
Thanks for your great work! I'm also experiencing a similar issue when training the model with @ZFTurbo 's pipeline. The forward pass is quiet slow and can't seem to get the model go over SDR 5. Meanwhile, the situation with training speed is different with the unofficial implementation yet a similar SDR event after 900 epochs (both trained on MUSDB-18 uncompressed)
Has there been any progress on that front? I appreciate your consideration in advance :)
Hi there,
Thanks for your great work! I'm also experiencing a similar issue when training the model with @ZFTurbo 's pipeline. The forward pass is quiet slow and can't seem to get the model go over SDR 5. Meanwhile, the situation with training speed is different with the unofficial implementation yet a similar SDR event after 900 epochs (both trained on MUSDB-18 uncompressed)
Has there been any progress on that front? I appreciate your consideration in advance :)
The authors of both repos seem to have gone radio silence. I hope they return and give insight on this, because on paper SCNET should be SOTA but right now there are no audio examples or pretrained checkpoints to confirm this, let alone training manually.
Sorry for the confusion, I've been really busy lately. I will provide the code and the pre-trained model on MUSDB within a week.
发自我的iPhone
------------------ Original ------------------ From: lucellent @.> Date: Wed,Jun 26,2024 1:20 AM To: starrytong/SCNet @.> Cc: starrytong @.>, Comment @.> Subject: Re: [starrytong/SCNet] Problem with training (Issue #1)
Hi there,
Thanks for your great work! I'm also experiencing a similar issue when training the model with @ZFTurbo 's pipeline. The forward pass is quiet slow and can't seem to get the model go over SDR 5. Meanwhile, the situation with training speed is different with the unofficial implementation yet a similar SDR event after 900 epochs (both trained on MUSDB-18 uncompressed)
Has there been any progress on that front? I appreciate your consideration in advance :)
The authors of both repos seem to have gone radio silence. I hope they return and give insight on this, because on paper SCNET should be SOTA but right now there are no audio examples or pretrained checkpoints to confirm this, let alone training manually.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
@starrytong your model in small variant gives very great results. Can you make the MUSDB18 pretrain for large SCNet model?
I might retrain soon, if I have GPUs.
@ZFTurbo The MUSDB18 pretrain for large SCNet model is available now.
Links are closed. Can you please give an access?
I have updated the permissions, you should be able to access it now.
I added your model in my training pipeline. But learning process is very slow and for the more of that day of training model reach only: Instr SDR vocals: 5.8361. It's too small comparing to other even weak models.
May be there are some normalization on other tricks we need to do on inference step?