mravanelli / SincNet

SincNet is a neural architecture for efficiently processing raw audio samples.
MIT License
1.11k stars 260 forks source link

Cannot reproduce cumulative frequency response of SincNet on Speaker-id #84

Open thibaultallenet-cea opened 4 years ago

thibaultallenet-cea commented 4 years ago

Hello Mirco Ravanelli, Training SincNet for speaker-id using TIMIT data following your directions, with the config file provided in your github end up with a different cumulative frequency response. The plot displays the cumulative frequency response of the SincNet filters on speaker-id at initialisation and last epoch (1500 from config file provided) Filters_response_init_last As you can see, the last epoch cumulative response is very close to the initialization. Also I checked the cumulative frequency response of the pretrained SincNet's filters you provided (here the last epoch is 360) Filters_response_pre_trained_model Neither of those two models show the cumulative frequency response presented in your paper Interpretable Convolutional Filters with SincNet. Moreover, it seams the filters have a lot of trouble to explore and find a better distribution. What are your thoughts ?

mravanelli commented 4 years ago

Hi, thank you for raising this issue. I thus have to double check the current model and try to retrieve the original one used in the paper. I will keep you updated.

Best,

Mirco

On Mon, 3 Feb 2020 at 04:21, thibaultallenet-cea notifications@github.com wrote:

Hello Mirco Ravanelli, Training SincNet for speaker-id using TIMIT data following your directions, with the config file provided in your github end up with a different cumulative frequency response. The plot displays the cumulative frequency response of the SincNet filters on speaker-id at initialisation and last epoch (1500 from config file provided) [image: Filters_response_init_last] https://user-images.githubusercontent.com/39118674/73639538-e0ca7b80-466c-11ea-8d03-54c021d2fd2e.png As you can see, the last epoch cumulative response is very close to the initialization. Also I checked the cumulative frequency response of the pretrained SincNet's filters you provided (here the last epoch is 360) [image: Filters_response_pre_trained_model] https://user-images.githubusercontent.com/39118674/73639769-4d457a80-466d-11ea-9825-0269e5ddfb65.png Neither of those two models show the cumulative frequency response presented in your paper Interpretable Convolutional Filters with SincNet. Moreover, it seams the filters have a lot of trouble to explore and find a better distribution. What are your thoughts ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mravanelli/SincNet/issues/84?email_source=notifications&email_token=AEA2ZVVVL4HGBWDHA7SNCR3RA7OYLA5CNFSM4KPBV2MKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IKQ4SLQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEA2ZVTM246QIDX62WDB2JDRA7OYLANCNFSM4KPBV2MA .

songfuture commented 3 years ago

Hi, thanks for your sharing,I am a new student of speech signal processing using deep learning. How do you draw the picture you commented on? I'm looking forward to your reply .

gzhu06 commented 3 years ago

I found the similar problem in my applications, the learned filters (i.e. lowhz, bandhz) barely changed with epochs. It's basically a mel filter bank

ZaUt-bio commented 1 year ago

Hi, thanks for your sharing,I am a new student of speech signal processing using deep learning. How do you draw the picture you commented on? I'm looking forward to your reply .

my question as well. everybody, we'll be thankful if you share your code for visualization here with us.