Closed tsly123 closed 5 years ago
Hi, this repository is mainly intended for speech recognition. You are probably talking about the other repository where we used sincnet for speaker id (https://github.com/mravanelli/SincNet). To address another task you have to change datasets and labels. To assign to each sentence to the right label, you have modify the dictionary "TIMIT_labels.npy" as you pointed out. When you change task, it could be very important to properly tune the hyperparameters of the model (e.g., cw_len, cnn_N_filt, cnn_len_filt, fc_lay,lr) to make them more suitable for the new task. Please, let me know if you are able to make it!
Thank you!
Hi, Thank you for you reply. The repo instruction is very informative. I will get back to you when i am able to run my fusion models.
Again, thank you for your time. tsly
Hi,
I am apologize about this but after struggling with Kaldi ASR (i'm new to kaldi), I realize that my EmotiW dataset which contains *.avi files only, can't be done as instructed for TIMIT tutorial which needs others must be done
files (as stated in Kaldi for Dummies, such as text
, lexicon
, or spk2utt
, etc.
Is there another way to construct the data preparation and alignment by myself, like preparing the pre-extracting features and labels to compatible with the pytorch-kaldi? I've tried to run the Librispeech s5 and other free datasets with Kaldi to get how the structure of prepared data but always got some errors. I've also looked at the Kaldi-io-for-python repo and thought that the features can be converted to ark file using it but for the label and alignment i don't know how to do it.
Thank you for your time. tsly
Hi, as far as I remember you only have an emotion recognition task where each sentence should be classified into a set of N emotions, right? To do it is much more convenient for you to start from this repository: https://github.com/mravanelli/SincNet This way you don't have anything to manage with Kaldi. You might just have to convert your signals from avi to wav.
Mirco
On Sat, Dec 1, 2018 at 3:13 AM tsly123 notifications@github.com wrote:
Hi, I am apologize about this but after struggling with Kaldi ASR (i'm new to kaldi), I realize that my EmotiW dataset https://sites.google.com/site/emotiwchallenge/ which contains *.avi files only, can't be done as instructed for TIMIT tutorial which needs others must be done files (as stated in Kaldi for Dummies http://kaldi-asr.org/doc/kaldi_for_dummies.html, such as text, lexicon, or spk2utt, etc.
Is there another way to construct the data preparation and alignment by myself, like preparing the pre-extracting features and labels to compatible with the pytorch-kaldi? I've tried to run the Librispeech s5 and other free datasets with Kaldi to get how the structure of prepared data but always got some errors. I've also looked at the Kaldi-io-for-python repo and thought that the features can be converted to ark file using it but for the label and alignment i don't know how to do it.
Thank you for your time. tsly
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/mravanelli/pytorch-kaldi/issues/22#issuecomment-443409049, or mute the thread https://github.com/notifications/unsubscribe-auth/AQGs1r8Z8Z94PJnKc1v0PH2afTJZKqXAks5u0jo2gaJpZM4Y5HGY .
Hi, Thank you for your work. I've read the instruction and the
SincNet
paper. I wonder that how can I use the pytorch-kaldi and, especially, the SincNet for emotion recognition task since the repo instruction and SincNet paper are all about the speaker identification which differ from emotion recognition in term of label. For example, all I need to do is to modify the labelTIMIT_labels.npy
to the label of my emotion dataset (0-7, for 8 emotions), of course along with other instruction steps?Thank you for your time. tsly