mravanelli / pytorch-kaldi

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
2.36k stars 446 forks source link

How to apply this repo for another emotion task? #22

Closed tsly123 closed 5 years ago

tsly123 commented 5 years ago

Hi, Thank you for your work. I've read the instruction and the SincNet paper. I wonder that how can I use the pytorch-kaldi and, especially, the SincNet for emotion recognition task since the repo instruction and SincNet paper are all about the speaker identification which differ from emotion recognition in term of label. For example, all I need to do is to modify the label TIMIT_labels.npy to the label of my emotion dataset (0-7, for 8 emotions), of course along with other instruction steps?

Thank you for your time. tsly

mravanelli commented 5 years ago

Hi, this repository is mainly intended for speech recognition. You are probably talking about the other repository where we used sincnet for speaker id (https://github.com/mravanelli/SincNet). To address another task you have to change datasets and labels. To assign to each sentence to the right label, you have modify the dictionary "TIMIT_labels.npy" as you pointed out. When you change task, it could be very important to properly tune the hyperparameters of the model (e.g., cw_len, cnn_N_filt, cnn_len_filt, fc_lay,lr) to make them more suitable for the new task. Please, let me know if you are able to make it!

Thank you!

tsly123 commented 5 years ago

Hi, Thank you for you reply. The repo instruction is very informative. I will get back to you when i am able to run my fusion models.

Again, thank you for your time. tsly

tsly123 commented 5 years ago

Hi, I am apologize about this but after struggling with Kaldi ASR (i'm new to kaldi), I realize that my EmotiW dataset which contains *.avi files only, can't be done as instructed for TIMIT tutorial which needs others must be done files (as stated in Kaldi for Dummies, such as text, lexicon, or spk2utt, etc.

Is there another way to construct the data preparation and alignment by myself, like preparing the pre-extracting features and labels to compatible with the pytorch-kaldi? I've tried to run the Librispeech s5 and other free datasets with Kaldi to get how the structure of prepared data but always got some errors. I've also looked at the Kaldi-io-for-python repo and thought that the features can be converted to ark file using it but for the label and alignment i don't know how to do it.

Thank you for your time. tsly

mravanelli commented 5 years ago

Hi, as far as I remember you only have an emotion recognition task where each sentence should be classified into a set of N emotions, right? To do it is much more convenient for you to start from this repository: https://github.com/mravanelli/SincNet This way you don't have anything to manage with Kaldi. You might just have to convert your signals from avi to wav.

Mirco

On Sat, Dec 1, 2018 at 3:13 AM tsly123 notifications@github.com wrote:

Hi, I am apologize about this but after struggling with Kaldi ASR (i'm new to kaldi), I realize that my EmotiW dataset https://sites.google.com/site/emotiwchallenge/ which contains *.avi files only, can't be done as instructed for TIMIT tutorial which needs others must be done files (as stated in Kaldi for Dummies http://kaldi-asr.org/doc/kaldi_for_dummies.html, such as text, lexicon, or spk2utt, etc.

Is there another way to construct the data preparation and alignment by myself, like preparing the pre-extracting features and labels to compatible with the pytorch-kaldi? I've tried to run the Librispeech s5 and other free datasets with Kaldi to get how the structure of prepared data but always got some errors. I've also looked at the Kaldi-io-for-python repo and thought that the features can be converted to ark file using it but for the label and alignment i don't know how to do it.

Thank you for your time. tsly

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/mravanelli/pytorch-kaldi/issues/22#issuecomment-443409049, or mute the thread https://github.com/notifications/unsubscribe-auth/AQGs1r8Z8Z94PJnKc1v0PH2afTJZKqXAks5u0jo2gaJpZM4Y5HGY .