mravanelli / SincNet

SincNet is a neural architecture for efficiently processing raw audio samples.
MIT License
1.14k stars 263 forks source link

voxCeleb1 and libri speech #25

Closed hdubey closed 5 years ago

hdubey commented 5 years ago

Hi Mirco, Thanks for the great work! I was wondering if you plan to share the data preparation recipe for voxCeleb1 and librispeech that can allow us to reproduce other experiments from your paper.

mravanelli commented 5 years ago

Hi, the librispeech data used in my paper are available upon requests (I can share with you if you want). We plan to move the speaker id experiments within the pytorch-kaldi project in the near future. The full pipelines starting from data creation to performance evalutation will be available there for several datasets..

On Apr 1, 2019 18:22, "hdubey" notifications@github.com wrote:

Hi Mirco, Thanks for the great work! I was wondering if you plan to share the data preparation recipe for voxCeleb1 and librispeech that can allow us to reproduce other experiments from your paper.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mravanelli/SincNet/issues/25, or mute the thread https://github.com/notifications/unsubscribe-auth/AQGs1i5uJtxKIr8gch7hR_sr6uHMq1qPks5vcoavgaJpZM4cWwWi .

hdubey commented 5 years ago

That would be helpful for me. Can you please share it on hxd150830@utdallas.edu

Thanks!

mravanelli commented 5 years ago

Ok, this will likely happen in the next couple of months...there is a lot of work to export everything to pytorch-kaldi

On Apr 1, 2019 19:52, "hdubey" notifications@github.com wrote:

That would be helpful for me. Can you please share it on hxd150830@utdallas.edu

Thanks!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mravanelli/SincNet/issues/25#issuecomment-478789165, or mute the thread https://github.com/notifications/unsubscribe-auth/AQGs1hAhAGVA1yWeSAPtyIWNG_CYjZddks5vcpvIgaJpZM4cWwWi .

hdubey commented 5 years ago

Hi, Thanks for replying. I just noticed your Librispeech.cfg has 8kHz sampling rate while TIMIT is 16kHz. Librispeech is avaliable at 16kHz, why did you downsampled those ? For a good comparison, both datasets should be at same Fs as per the general audio processing conventions. Any good reasons for doing so. Did you get better results at 8kHz for librispeech?

mravanelli commented 5 years ago

The difference in terms of performance is not that much between 8 kHz and 16 kHz for a speaker-id task. For librispech, I finally saw a slightly better performance at 8kHz and the network is also much faster.

Mirco

https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail Virus-free. www.avast.com https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

On Wed, 10 Apr 2019 at 11:53, hdubey notifications@github.com wrote:

Hi, Thanks for replying. I just noticed your Librispeech.cfg has 8kHz sampling rate while TIMIT is 16kHz. Librispeech is avaliable at 16kHz, why did you downsampled those ? For a good comparison, both datasets should be at same Fs as per the general audio processing conventions. Any good reasons for doing so. Did you get better results at 8kHz for librispeech?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mravanelli/SincNet/issues/25#issuecomment-481749799, or mute the thread https://github.com/notifications/unsubscribe-auth/AQGs1velFbYnKjn27k3QorXBi5Oc_yPIks5vfgkTgaJpZM4cWwWi .

hdubey commented 5 years ago

Thanks. It would be good to have these numbers into your Readme as most experiments take few to several days. I see most .cfg files has 0 drop-out factor meaning no drop-out. Is it the best configuration or just default. What drop-out you used for best results in arxiv paper ? Thanks for your time Mirco.

mravanelli commented 5 years ago

In the repository, we release the best system that we were able to find so far. It is slightly better than that reported in the paper, because we worked on this task also after the deadline and we found a set of the hyperparameters a bit better. So consider the cfg in the repo the best config file found so far.

Mirco

https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail Virus-free. www.avast.com https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

On Wed, 10 Apr 2019 at 13:11, hdubey notifications@github.com wrote:

Thanks. It would be good to have these numbers into your Readme as most experiments take few to several days. I see most .cfg files has 0 drop-out factor meaning no drop-out. Is it the best configuration or just default. What drop-out you used for best results in arxiv paper ? Thanks for your time Mirco.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mravanelli/SincNet/issues/25#issuecomment-481779426, or mute the thread https://github.com/notifications/unsubscribe-auth/AQGs1sban9Uo3t5Z2nRdeQ-qL1FD409yks5vfhsxgaJpZM4cWwWi .

hdubey commented 5 years ago

Sounds good. Did you experimented with window length and skip rate. Do you think 200ms windows are better than 1second windows. Is it true that 10ms skip rate was better than 50ms skip on 1second windows. I am looking to speed up the training with least degradation. Since drop-out did not helped, other than lower sampling rate, what could help in accelerating the training process?

hdubey commented 5 years ago

I see that when the job crash due to memory or other issues, it do not resume from previous epoch. What change in code is needed to resume training from previous epoch without causing any degradation from uninterrupted training? Thanks for all support, we all like SincNet and trying to push its capability in all audio applications.

mravanelli commented 5 years ago

Maybe you can try with 15 or 25 ms skip and see if the degradation is acceptable. You can also try windows if 100/150 ms that works quite well.

On Apr 10, 2019 13:22, "hdubey" notifications@github.com wrote:

Sounds good. Did you experimented with window length and skip rate. Do you think 200ms windows are better than 1second windows. Is it true that 10ms skip rate was better than 50ms skip on 1second windows. I am looking to speed up the training with least degradation. Since drop-out did not helped, other than lower sampling rate, what could help in accelerating the training process?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mravanelli/SincNet/issues/25#issuecomment-481783539, or mute the thread https://github.com/notifications/unsubscribe-auth/AQGs1ngccjtrN_50O0vpYcVx79BnFl0mks5vfh3pgaJpZM4cWwWi .

akashicMarga commented 5 years ago

hi, can i get the librispeech data used in paper and all data preparation files (train.scp,test.scp,labels.npy). at akashsingh24695@gmail.com

mravanelli commented 5 years ago

Hi Akash, sure. At the link below you can find the dataset used for our speaker recognition experiments:

https://drive.google.com/open?id=1lDHRUIWzZDg_0fauo3zh4VA8LgqkwsLJ

Best,

Mirco

On Wed, 26 Jun 2019 at 12:01, Akash Singh notifications@github.com wrote:

hi, can i get the librispeech data used in paper and all data preparation files (train.scp,test.scp,labels.npy). at akashsingh24695@gmail.com

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/mravanelli/SincNet/issues/25?email_source=notifications&email_token=AEA2ZVXYS2JY3MJM4EBJIWDP4OHD5A5CNFSM4HC3AWRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYUAHLA#issuecomment-505938860, or mute the thread https://github.com/notifications/unsubscribe-auth/AEA2ZVUFKUF4VP4TKD2HSZLP4OHD5ANCNFSM4HC3AWRA .

hdubey commented 5 years ago

@mravanelli are there any filelists for reproducing the voxCeleb and voxCeleb2 experiments? Thanks!

mravanelli commented 5 years ago

For VoxCeleb we just used the standard split provided withing the VoxCeleb dataset..

On Wed, 26 Jun 2019 at 14:38, hdubey notifications@github.com wrote:

@mravanelli https://github.com/mravanelli are there any filelists for reproducing the voxCeleb and voxCeleb2 experiments? Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mravanelli/SincNet/issues/25?email_source=notifications&email_token=AEA2ZVQQKV3A24QTXWYXAADP4OZSJA5CNFSM4HC3AWRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYUOIOQ#issuecomment-505996346, or mute the thread https://github.com/notifications/unsubscribe-auth/AEA2ZVRAGNS35X3F33NWZATP4OZSJANCNFSM4HC3AWRA .

Liujingxiu23 commented 4 years ago

@mravanelli Have you tried the proformace of sincnet in voxceleb ? what about the eer?