mravanelli / SincNet

SincNet is a neural architecture for efficiently processing raw audio samples.
MIT License
1.14k stars 263 forks source link

Re-training the trained SincNet on new dataset #38

Closed hdubey closed 5 years ago

hdubey commented 5 years ago

Hi Mirco, I have a task where I need to take a pre-trained SincNet and re-train it on our data. Both datasets are prepared according to your protocols. SincNet trained on first dataset is available as well. Now, I want to remove the output layer of trained network and use a new output corresponding to new speakers from Dataset 2. What scripts I need to modify to do it. Do you think such a strategy is good rather than combining both datasets and re-train on composite dataset. In real-world, we get more data after every few month and re-training could be time consuming. So, I want to test this strategy to initialized from a previously trained network hoping that is could converge faster. Thanks!

mravanelli commented 5 years ago

Hi, you can take a look into this discussion, where we discussed it: https://github.com/mravanelli/SincNet/issues/37

In general, you should do pre-training and fine-tune the full network or just a part of it.

Mirco

hdubey commented 5 years ago

Thanks Mirco. It is helpful. Right now, when training on same dataset, I see when I train for 100 epochs and start the training process again with latest checkpoint, Results1 is obtained for 100 epoch + another 300 epochs. However, when I do 400 epochs all together, Result2 is obtained. Results1 and Results2 are different. It seems initializing from already trained network and training on same data seems to produce different results. What could be the cause of this issue. It seems some of the states are not saved and hence initializing from past check-point comes out to be different.

mravanelli commented 5 years ago

Hi, slightly different results are normal in this case. Every time we sample a different random chunk of speech, as you can see from the function "create_batches_rnd" in data_io.py.

https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail Virus-free. www.avast.com https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

On Mon, 22 Apr 2019 at 00:20, hdubey notifications@github.com wrote:

Thanks Mirco. It is helpful. Right now, when training on same dataset, I see when I train for 100 epochs and start the training process again with latest checkpoint, Results1 is obtained for 100 epoch + another 300 epochs. However, when I do 400 epochs all together, Result2 is obtained. Results1 and Results2 are different. It seems initializing from already trained network and training on same data seems to produce different results. What could be the cause of this issue. It seems some of the states are not saved and hence initializing from past check-point comes out to be different.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mravanelli/SincNet/issues/38#issuecomment-485320417, or mute the thread https://github.com/notifications/unsubscribe-auth/AEA2ZVWBQO34M3NUJUGYDVLPRU4KRANCNFSM4HHK6YDA .

hdubey commented 5 years ago

Hi @mravanelli Thanks for validating that. I have another question which was an attempt to realize your STEP3 suggested in ISSUE 37.

I tried executing the suggestions in STEP3 above. I am now using LABEL-dict, Train_list and Test_file for new data in .cfg file. PT_File is BEST trained model on older data. When I added two layers, Class_Lay= 1024, 728 in cfg file and python speaker_id.py throws this error:

" DNN2_net.load_state_dict(checkpoint_load['DNN2_model_par']) File "/scratch2/hxd150830/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 719, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for MLP: Missing key(s) in state_dict: "wx.1.weight", "wx.1.bias", "bn.1.weight", "bn.1.bias", "bn.1.running_mean", "bn.1.running_var", "ln.1.gamma", "ln.1.beta", "ln0.gamma", "ln0.beta". size mismatch for wx.0.weight: copying a param of torch.Size([2048, 2048]) from checkpoint, where the shape is torch.Size([462, 2048]) in current model. size mismatch for wx.0.bias: copying a param of torch.Size([2048]) from checkpoint, where the shape is torch.Size([462]) in current model. size mismatch for bn.0.weight: copying a param of torch.Size([2048]) from checkpoint, where the shape is torch.Size([462]) in current model. size mismatch for bn.0.bias: copying a param of torch.Size([2048]) from checkpoint, where the shape is torch.Size([462]) in current model. size mismatch for bn.0.running_mean: copying a param of torch.Size([2048]) from checkpoint, where the shape is torch.Size([462]) in current model. size mismatch for bn.0.running_var: copying a param of torch.Size([2048]) from checkpoint, where the shape is torch.Size([462]) in current model. size mismatch for ln.0.gamma: copying a param of torch.Size([2048]) from checkpoint, where the shape is torch.Size([462]) in current model. size mismatch for ln.0.beta: copying a param of torch.Size([2048]) from checkpoint, where the shape is torch.Size([462]) in current model. "

Seems the size mis-match is causing issue. How to best handle it in this code. Thanks!

mravanelli commented 5 years ago

I'm not sure, but I think the problem is due to the fact that in the pre-trained model you have 462 speakers, while the new one has 728. You thus should avoid loading the last part of the model by adding a comment into the following line of "speaker_id.py" "#DNN2_net .load_state_dict(checkpoint_load['DNN2_model_par'])". Please, let us know if this solves the issue...

Mirco

https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail Virus-free. www.avast.com https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

On Tue, 23 Apr 2019 at 12:07, hdubey notifications@github.com wrote:

Hi @mravanelli https://github.com/mravanelli Thanks for validating that. I have another question which was an attempt to realize your STEP3 suggested in ISSUE 37.

I tried executing the suggestions in STEP3 above. I am now using LABEL-dict, Train_list and Test_file for new data in .cfg file. PT_File is BEST trained model on older data. When I added two layers, Class_Lay= 1024, 728 in cfg file and python speaker_id.py throws this error:

" DNN2_net.load_state_dict(checkpoint_load['DNN2_model_par']) File "/scratch2/hxd150830/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 719, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for MLP: Missing key(s) in state_dict: "wx.1.weight", "wx.1.bias", "bn.1.weight", "bn.1.bias", "bn.1.running_mean", "bn.1.running_var", "ln.1.gamma", "ln.1.beta", "ln0.gamma", "ln0.beta". size mismatch for wx.0.weight: copying a param of torch.Size([2048, 2048]) from checkpoint, where the shape is torch.Size([462, 2048]) in current model. size mismatch for wx.0.bias: copying a param of torch.Size([2048]) from checkpoint, where the shape is torch.Size([462]) in current model. size mismatch for bn.0.weight: copying a param of torch.Size([2048]) from checkpoint, where the shape is torch.Size([462]) in current model. size mismatch for bn.0.bias: copying a param of torch.Size([2048]) from checkpoint, where the shape is torch.Size([462]) in current model. size mismatch for bn.0.running_mean: copying a param of torch.Size([2048]) from checkpoint, where the shape is torch.Size([462]) in current model. size mismatch for bn.0.running_var: copying a param of torch.Size([2048]) from checkpoint, where the shape is torch.Size([462]) in current model. size mismatch for ln.0.gamma: copying a param of torch.Size([2048]) from checkpoint, where the shape is torch.Size([462]) in current model. size mismatch for ln.0.beta: copying a param of torch.Size([2048]) from checkpoint, where the shape is torch.Size([462]) in current model. "

Seems the size mis-match is causing issue. How to best handle it in this code. Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mravanelli/SincNet/issues/38#issuecomment-485869728, or mute the thread https://github.com/notifications/unsubscribe-auth/AEA2ZVR7Q6NBS6FYPPFSKZLPR4X4VANCNFSM4HHK6YDA .

hdubey commented 5 years ago

Hi @mravanelli Thanks for suggestion. I just got it started, being single GPU training it will take a few days to finish with smaller batch-sizes I am trying. I will update you in 4-7 days.

hdubey commented 5 years ago

Hi Micro, Thanks for this awesome tool. I am wondering how to get the d-vectors from Librispeech model given there are two layers in classification layer. Thanks!

mravanelli commented 5 years ago

One should modify a bit the compute_d_vector.py. The other possibility is to move the layer from the classifier to the dnn part and use exactly the same code for d-vector computation. This is the most natural solution, let me change the config file....

mravanelli commented 5 years ago

You may take a look into this implementation of SincNet: https://github.com/santi-pdp/pase

MuruganR96 commented 5 years ago

@mravanelli @hdubey sir, I did retraining (my speaker's audio) with TIMIT old weights, accuracy was deceased. :)

@hdubey sir how were your results. please share your suggestions.:)

Thank you so much

mravanelli commented 5 years ago

Hi, have you reduced the learning rate for the fine-tuning phase? If not, the risk is that you "destroy" what you have learned before.

On Jul 22, 2019 01:46, "Murugan R" notifications@github.com wrote:

@mravanelli https://github.com/mravanelli @hdubey https://github.com/hdubey sir, I did my speakers audios retraining with old weights, accuracy was deceased. :)

@hdubey https://github.com/hdubey sir how were your results. please share your suggestions.:)

Thank you so much

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mravanelli/SincNet/issues/38?email_source=notifications&email_token=AEA2ZVQZOQVJJ2I5GC6SN4LQAVCVHA5CNFSM4HHK6YDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2O2UGY#issuecomment-513649179, or mute the thread https://github.com/notifications/unsubscribe-auth/AEA2ZVSOJSB4DEGG4A3HMXTQAVCVHANCNFSM4HHK6YDA .