starrytong / SCNet

MIT License
16 stars 0 forks source link

Problem with training #1

Open ZFTurbo opened 2 months ago

ZFTurbo commented 2 months ago

I added your model in my training pipeline. But learning process is very slow and for the more of that day of training model reach only: Instr SDR vocals: 5.8361. It's too small comparing to other even weak models.

May be there are some normalization on other tricks we need to do on inference step?

starrytong commented 2 months ago

I tried your training pipeline, and by the second epoch, the SDR for vocals had already reached 6.1, so could there be some configuration error?

ZFTurbo commented 2 months ago

Could you post your config may be? What did you use for training and for validation?

starrytong commented 2 months ago

Using 4 GPUs, train and test on musdb18

audio:
  chunk_size: 441000
  num_channels: 2
  sample_rate: 44100
  min_mean_abs: 0.001

model:
  sources: ['drums', 'bass', 'other', 'vocals']
  audio_channels: 2
  dims: [4, 32, 64, 128] #[4, 64, 128, 256] in SCNet-large
  nfft: 4096
  hop_size: 1024
  win_size: 4096
  normalized: True
  band_configs: {
    'low': { 'SR': .175, 'stride': 1, 'kernel': 3 },
    'mid': { 'SR': .392, 'stride': 4, 'kernel': 4 },
    'high': {'SR': .433, 'stride': 16, 'kernel': 16 }
  }
  #SR=[.225, .372, .403] in SCNet-large                      
  conv_depths: [3,2,1] 
  compress: 4
  conv_kernel: 3
  num_dplayer: 6
  expand: 1

training:
  batch_size: 8
ZFTurbo commented 2 months ago

Earlier I trained only ['vocals', 'other'] model with my vocals dataset. Now I switched on MUSDB18 dataset with 4 stems. And it's also not great - SDR for vocals less than 2 after 2 epochs. What's the settings for training did you use in config? I train with less batch size on 3 x 48 GB cards. But I don't think it can have such big impact.

ZFTurbo commented 2 months ago

My current config:

audio:
  chunk_size: 441000
  num_channels: 2
  sample_rate: 44100
  min_mean_abs: 0.000

model:
  sources: ['drums', 'bass', 'other', 'vocals']
  audio_channels: 2
  dims: [4, 32, 64, 128] #[4, 64, 128, 256] in SCNet-large
  nfft: 4096
  hop_size: 1024
  win_size: 4096
  normalized: True
  band_configs: {
    'low': { 'SR': .175, 'stride': 1, 'kernel': 3 },
    'mid': { 'SR': .392, 'stride': 4, 'kernel': 4 },
    'high': { 'SR': .433, 'stride': 16, 'kernel': 16 }
  }
  # SR=[.225, .372, .403] in SCNet-large
  conv_depths: [3, 2, 1]
  compress: 4
  conv_kernel: 3
  # Dual-path RNN
  num_dplayer: 6
  expand: 1
  # mamba
  use_mamba: False
  mamba_config: {
    'd_stat': 16,
    'd_conv': 4,
    'd_expand': 2
  }

training:
  batch_size: 4
  gradient_accumulation_steps: 1
  grad_clip: 0
  instruments:
    - drums
    - bass
    - other
    - vocals
  lr: 4.0e-05
  patience: 2
  reduce_factor: 0.95
  target_instrument: null
  num_epochs: 1000
  num_steps: 1000
  q: 0.95
  coarse_loss_clip: true
  ema_momentum: 0.999
  optimizer: adam
  other_fix: false # it's needed for checking on multisong dataset if other is actually instrumental
  use_amp: true # enable or disable usage of mixed precision (float16) - usually it must be true
starrytong commented 2 months ago

This is my config, with different lr and batch_size. I fully adopted your configuration (3 GPUs, batch=4, lr=4.0e-5), and after the second epoch, the SDR for vocal was 2.6303. Additionally, with batch_size of 4, it only used about 10GB of GPU memory, may be could try a larger batch size?

  batch_size: 8
  gradient_accumulation_steps: 1
  grad_clip: 0
  instruments:
    - drums
    - bass
    - other
    - vocals
  lr: 5.0e-04
  patience: 2
  reduce_factor: 0.95
  target_instrument: null
  num_epochs: 1000
  num_steps: 1000
  q: 0.95
  coarse_loss_clip: true
  ema_momentum: 0.999
  optimizer: adam
  other_fix: false # it's needed for checking on multisong dataset if other is actually instrumental
  use_amp: true # enable or disable usage of mixed precision (float16) - usually it must be true
ZFTurbo commented 2 months ago

With batch_size: 6 - I have out of memory. With batch size: 5 - very big slow down (I see it copy data all the time). 4 - is maximum optimal.

I still can't make it work, normally. Is it possible you train some MUSDB18 checkpoint at your side and put it somewhere so we can start from this as pre-train after?

amirpashamobinitehrani commented 1 week ago

Hi there,

Thanks for your great work! I'm also experiencing a similar issue when training the model with @ZFTurbo 's pipeline. The forward pass is quiet slow and can't seem to get the model go over SDR 5. Meanwhile, the situation with training speed is different with the unofficial implementation yet a similar SDR event after 900 epochs (both trained on MUSDB-18 uncompressed)

Has there been any progress on that front? I appreciate your consideration in advance :)

lucellent commented 1 week ago

Hi there,

Thanks for your great work! I'm also experiencing a similar issue when training the model with @ZFTurbo 's pipeline. The forward pass is quiet slow and can't seem to get the model go over SDR 5. Meanwhile, the situation with training speed is different with the unofficial implementation yet a similar SDR event after 900 epochs (both trained on MUSDB-18 uncompressed)

Has there been any progress on that front? I appreciate your consideration in advance :)

The authors of both repos seem to have gone radio silence. I hope they return and give insight on this, because on paper SCNET should be SOTA but right now there are no audio examples or pretrained checkpoints to confirm this, let alone training manually.

starrytong commented 1 week ago

Sorry for the confusion, I've been really busy lately. I will provide the code and the pre-trained model on MUSDB within a week.

发自我的iPhone

------------------ Original ------------------ From: lucellent @.> Date: Wed,Jun 26,2024 1:20 AM To: starrytong/SCNet @.> Cc: starrytong @.>, Comment @.> Subject: Re: [starrytong/SCNet] Problem with training (Issue #1)

Hi there,

Thanks for your great work! I'm also experiencing a similar issue when training the model with @ZFTurbo 's pipeline. The forward pass is quiet slow and can't seem to get the model go over SDR 5. Meanwhile, the situation with training speed is different with the unofficial implementation yet a similar SDR event after 900 epochs (both trained on MUSDB-18 uncompressed)

Has there been any progress on that front? I appreciate your consideration in advance :)

The authors of both repos seem to have gone radio silence. I hope they return and give insight on this, because on paper SCNET should be SOTA but right now there are no audio examples or pretrained checkpoints to confirm this, let alone training manually.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>