Closed bogdan-nechita closed 5 years ago
Hi Bogdan, thank you for your interest in my work. I would start from point 3 and see if the recognition accuracy is still acceptable. Regarding how to implement, you have to change the "class_lay" to 550. Then, you have to freeze all the other networks (i.e., you don't have to compute any updates for their parameters). This can be simply done by adding a comment in the following line of the speaker_id.py script:
https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail Virus-free. www.avast.com https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
On Thu, 18 Apr 2019 at 09:07, Bogdan Nechita notifications@github.com wrote:
Hello,
First of all, thanks for all the great work :)
I managed to reproduce the results from the paper using the TIMIT dataset and I am now thinking about the following scenario:
I have a dataset of 500 speakers, I trained the model on it, I get a good enough accuracy and the model can reliably identify one of those 500 speakers from an audio sample. Now I need to add one or more new speakers, let's say 5; the desired outcome is a model that can identify one of the now 505 speakers. This could be a case that repeats in the future, as I get more audio data.
I currently have these approaches in mind:
1.
Train the model from scratch every time I need to add new speakers. The disadvantage to this is that I don't leverage any accumulated knowledge from previous trainings. 2.
Use transfer learning somehow - load the weights from the "500 speakers" trained model and replace the softmax layer with one that has 505 classes, then train a few more epochs. 3.
Same as 2, except we also freeze all the layers except softmax.
How would you approach this? If 2 and 3 are viable options, how would you implement that? Would changing "class_lay" in the config to 505 and training with the new dataset be enough for 2? How would you approach freezing the non-softmax layers?
Thanks again, Bogdan
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mravanelli/SincNet/issues/37, or mute the thread https://github.com/notifications/unsubscribe-auth/AEA2ZVTQQID3II7XBB64CI3PRBW7LANCNFSM4HG4W6NQ .
Thank you for the quick and helpful answer.
Best regards, Bogdan
Hi @mravanelli I tried executing the suggestions in STEP3 above. I am now using LABEL-dict, Train_list and Test_file for new data in .cfg file. PT_File is BEST trained model on older data. When I added two layers, Class_Lay= 1024, 728 in cfg file and python speaker_id.py throws this error:
" DNN2_net.load_state_dict(checkpoint_load['DNN2_model_par']) File "/scratch2/hxd150830/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 719, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for MLP: Missing key(s) in state_dict: "wx.1.weight", "wx.1.bias", "bn.1.weight", "bn.1.bias", "bn.1.running_mean", "bn.1.running_var", "ln.1.gamma", "ln.1.beta", "ln0.gamma", "ln0.beta". size mismatch for wx.0.weight: copying a param of torch.Size([2048, 2048]) from checkpoint, where the shape is torch.Size([462, 2048]) in current model. size mismatch for wx.0.bias: copying a param of torch.Size([2048]) from checkpoint, where the shape is torch.Size([462]) in current model. size mismatch for bn.0.weight: copying a param of torch.Size([2048]) from checkpoint, where the shape is torch.Size([462]) in current model. size mismatch for bn.0.bias: copying a param of torch.Size([2048]) from checkpoint, where the shape is torch.Size([462]) in current model. size mismatch for bn.0.running_mean: copying a param of torch.Size([2048]) from checkpoint, where the shape is torch.Size([462]) in current model. size mismatch for bn.0.running_var: copying a param of torch.Size([2048]) from checkpoint, where the shape is torch.Size([462]) in current model. size mismatch for ln.0.gamma: copying a param of torch.Size([2048]) from checkpoint, where the shape is torch.Size([462]) in current model. size mismatch for ln.0.beta: copying a param of torch.Size([2048]) from checkpoint, where the shape is torch.Size([462]) in current model. "
Seems the size mis-match is causing issue. How to best handle it in this code. Thanks!
@bogdan-nechita , Could you get it working. Can you explain how to fix above error. It is more helpful you can share you script too. Thanks!
Hello @hdubey,
The above error is because you load the 462-class trained DNN2 layer, as @mravanelli mentioned here: https://github.com/mravanelli/SincNet/issues/38#issuecomment-485873804
I got the training to work by commenting out that line of code.
Regards, Bogdan
Thanks Bogdan and good luck for your experiments.
Hi Mirco, I was successfully able to run the finetune process by adding a new speaker. but the finetune network wants to see whole data at the time of finetuning, why? as I have freezed the layers for other speakers then the finetune process should only see new speaker data and learn to identify it.
for running the finetune script i have made following changes.
By making this change the model output is fine. But i want that the model to learn on new data only without seeing the previous data as the previous records are already freezed.
Hello,
First of all, thanks for all the great work :)
I managed to reproduce the results from the paper using the TIMIT dataset and I am now thinking about the following scenario:
I have a dataset of 500 speakers, I trained the model on it, I get a good enough accuracy and the model can reliably identify one of those 500 speakers from an audio sample. Now I need to add one or more new speakers, let's say 5; the desired outcome is a model that can identify one of the now 505 speakers. This could be a case that repeats in the future, as I get more audio data.
I currently have these approaches in mind:
Train the model from scratch every time I need to add new speakers. The disadvantage to this is that I don't leverage any accumulated knowledge from previous trainings.
Use transfer learning somehow - load the weights from the "500 speakers" trained model and replace the softmax layer with one that has 505 classes, then train a few more epochs.
Same as 2, except we also freeze all the layers except softmax.
How would you approach this? If 2 and 3 are viable options, how would you implement that? Would changing "class_lay" in the config to 505 and training with the new dataset be enough for 2? How would you approach freezing the non-softmax layers?
Thanks again, Bogdan