mlpack / models

models built with mlpack
https://models.mlpack.org/docs
BSD 3-Clause "New" or "Revised" License
33 stars 42 forks source link

input for RNN (LSTM) when using MEL-coefficients for speech recognition #42

Closed InterTriplete2010 closed 3 years ago

InterTriplete2010 commented 3 years ago

Ok, so I have another question related to the input of the neural network. This time is about RNN (LSTM). I am trying to do some simple speech recognition using MEL-coefficients. I am trying to adapt the example of the stock prediction (https://github.com/mlpack/examples/blob/master/lstm_stock_prediction/lstm_stock_prediction.cpp) to my needs, but I am having problems figuring out how to properly set-up my 3D matrix.

Say that I have a matrix of size 2000x1000, where each row represents a different audio file (let's assume we have 10 words to recognize => 200 audio files per word) and each column represents a sample of a MEL-coefficient. Let's assume that I have 20 MEL-coefficients, each one comprising of 50 samples. So basically the first 50 samples represent the samples of the first MEL coefficient, the next 50 samples represent the samples of the second MEL coefficients, etc. The MEL coefficients are also organized in ascending order.

How do my 3D Train and Test matrix should look like, assuming (for instance), that I am using 10% of the words for validation?

Thank you for your help!!! Alex.

rcurtin commented 3 years ago

Hi @InterTriplete2010, sorry for the slow response on this one! I believe that the shape that you are looking for will be an arma::cube of 20 rows x 2000 columns x 50 slices. That is, each row represents a dimension of a single sample. So in your case you say that there are 20 Mel coefficients in each sample. Thus, you would have 20 rows. Each column should represent a sequence/audio file (just like how each column in all mlpack methods represents an observation). Then, each time step should be a slice---and based on what you wrote I think there are 50 time steps in your data.

Anyway, hopefully this clarifies the general idea. You might have to do some restructuring of your data to get it into that format. :+1:

InterTriplete2010 commented 3 years ago

Thank you 👍

mlpack-bot[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! :+1: