mravanelli / SincNet

SincNet is a neural architecture for efficiently processing raw audio samples.
MIT License
1.14k stars 263 forks source link

Adding new speakers/using transfer learning #37

Closed bogdan-nechita closed 5 years ago

bogdan-nechita commented 5 years ago

Hello,

First of all, thanks for all the great work :)

I managed to reproduce the results from the paper using the TIMIT dataset and I am now thinking about the following scenario:

I have a dataset of 500 speakers, I trained the model on it, I get a good enough accuracy and the model can reliably identify one of those 500 speakers from an audio sample. Now I need to add one or more new speakers, let's say 5; the desired outcome is a model that can identify one of the now 505 speakers. This could be a case that repeats in the future, as I get more audio data.

I currently have these approaches in mind:

  1. Train the model from scratch every time I need to add new speakers. The disadvantage to this is that I don't leverage any accumulated knowledge from previous trainings.

  2. Use transfer learning somehow - load the weights from the "500 speakers" trained model and replace the softmax layer with one that has 505 classes, then train a few more epochs.

  3. Same as 2, except we also freeze all the layers except softmax.

How would you approach this? If 2 and 3 are viable options, how would you implement that? Would changing "class_lay" in the config to 505 and training with the new dataset be enough for 2? How would you approach freezing the non-softmax layers?

Thanks again, Bogdan

mravanelli commented 5 years ago

Hi Bogdan, thank you for your interest in my work. I would start from point 3 and see if the recognition accuracy is still acceptable. Regarding how to implement, you have to change the "class_lay" to 550. Then, you have to freeze all the other networks (i.e., you don't have to compute any updates for their parameters). This can be simply done by adding a comment in the following line of the speaker_id.py script:

optimizer_CNN.step() #optimizer_DNN1.step()

https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail Virus-free. www.avast.com https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

On Thu, 18 Apr 2019 at 09:07, Bogdan Nechita notifications@github.com wrote:

Hello,

First of all, thanks for all the great work :)

I managed to reproduce the results from the paper using the TIMIT dataset and I am now thinking about the following scenario:

I have a dataset of 500 speakers, I trained the model on it, I get a good enough accuracy and the model can reliably identify one of those 500 speakers from an audio sample. Now I need to add one or more new speakers, let's say 5; the desired outcome is a model that can identify one of the now 505 speakers. This could be a case that repeats in the future, as I get more audio data.

I currently have these approaches in mind:

1.

Train the model from scratch every time I need to add new speakers. The disadvantage to this is that I don't leverage any accumulated knowledge from previous trainings. 2.

Use transfer learning somehow - load the weights from the "500 speakers" trained model and replace the softmax layer with one that has 505 classes, then train a few more epochs. 3.

Same as 2, except we also freeze all the layers except softmax.

How would you approach this? If 2 and 3 are viable options, how would you implement that? Would changing "class_lay" in the config to 505 and training with the new dataset be enough for 2? How would you approach freezing the non-softmax layers?

Thanks again, Bogdan

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mravanelli/SincNet/issues/37, or mute the thread https://github.com/notifications/unsubscribe-auth/AEA2ZVTQQID3II7XBB64CI3PRBW7LANCNFSM4HG4W6NQ .

bogdan-nechita commented 5 years ago

Thank you for the quick and helpful answer.

Best regards, Bogdan

hdubey commented 5 years ago

Hi @mravanelli I tried executing the suggestions in STEP3 above. I am now using LABEL-dict, Train_list and Test_file for new data in .cfg file. PT_File is BEST trained model on older data. When I added two layers, Class_Lay= 1024, 728 in cfg file and python speaker_id.py throws this error:

" DNN2_net.load_state_dict(checkpoint_load['DNN2_model_par']) File "/scratch2/hxd150830/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 719, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for MLP: Missing key(s) in state_dict: "wx.1.weight", "wx.1.bias", "bn.1.weight", "bn.1.bias", "bn.1.running_mean", "bn.1.running_var", "ln.1.gamma", "ln.1.beta", "ln0.gamma", "ln0.beta". size mismatch for wx.0.weight: copying a param of torch.Size([2048, 2048]) from checkpoint, where the shape is torch.Size([462, 2048]) in current model. size mismatch for wx.0.bias: copying a param of torch.Size([2048]) from checkpoint, where the shape is torch.Size([462]) in current model. size mismatch for bn.0.weight: copying a param of torch.Size([2048]) from checkpoint, where the shape is torch.Size([462]) in current model. size mismatch for bn.0.bias: copying a param of torch.Size([2048]) from checkpoint, where the shape is torch.Size([462]) in current model. size mismatch for bn.0.running_mean: copying a param of torch.Size([2048]) from checkpoint, where the shape is torch.Size([462]) in current model. size mismatch for bn.0.running_var: copying a param of torch.Size([2048]) from checkpoint, where the shape is torch.Size([462]) in current model. size mismatch for ln.0.gamma: copying a param of torch.Size([2048]) from checkpoint, where the shape is torch.Size([462]) in current model. size mismatch for ln.0.beta: copying a param of torch.Size([2048]) from checkpoint, where the shape is torch.Size([462]) in current model. "

Seems the size mis-match is causing issue. How to best handle it in this code. Thanks!

hdubey commented 5 years ago

@bogdan-nechita , Could you get it working. Can you explain how to fix above error. It is more helpful you can share you script too. Thanks!

bogdan-nechita commented 5 years ago

Hello @hdubey,

The above error is because you load the 462-class trained DNN2 layer, as @mravanelli mentioned here: https://github.com/mravanelli/SincNet/issues/38#issuecomment-485873804

I got the training to work by commenting out that line of code.

Regards, Bogdan

hdubey commented 5 years ago

Thanks Bogdan and good luck for your experiments.

akashicMarga commented 5 years ago

Hi Mirco, I was successfully able to run the finetune process by adding a new speaker. but the finetune network wants to see whole data at the time of finetuning, why? as I have freezed the layers for other speakers then the finetune process should only see new speaker data and learn to identify it.

akashicMarga commented 5 years ago

for running the finetune script i have made following changes.

  1. in config file increased class by adding number of new users. Added new users data location in the original train.scp, test.scp and added their labels to the previous labels.
  2. In speaker_id.py i have commented out

    DNN2_net.load_state_dict(checkpoint_load['DNN2_model_par'])

    optimizer_CNN.step()

    optimizer_DNN1.step()

    By making this change the model output is fine. But i want that the model to learn on new data only without seeing the previous data as the previous records are already freezed.