mlpack / models

models built with mlpack
https://models.mlpack.org/docs
BSD 3-Clause "New" or "Revised" License
33 stars 42 forks source link

Question about the response array for a LSTM (many-to-one) model #50

Closed InterTriplete2010 closed 3 years ago

InterTriplete2010 commented 3 years ago

Hello,

So I have been trying to learn how to use LSTM with mlpack. I am using a simple example that I found online (https://stackabuse.com/solving-sequence-problems-with-lstm-in-keras/) that was implemented in keras. In this example, the training cube matrix is composed of 1 feature, 15 samples and 3 time steps. Each sample is composed of 3 numbers. Say for instance that the first sample is [1 2 3], the second one is [4 5 6], etc. The output array will look something like this [6, 15 ....]. In other words, it simply the sum of the 3 time steps. The input cube matrix is ok, but I am having problems to understand what I am doing wrong with the output array. Now, RNN only accepts a cube matrix for the response, so I decided to create a cube matrix with the following dimension: trainY.set_size(1,15,1). The same would apply for testing data, which would be: testY.set_size(1,1,3). Unfortunately, the results don't make any sense and I don't even understand why the output of the testing data returns 3 values instead of just one. Why isn't possible to use a vector for the response data, as done in keras? I am obviously doing something wrong, but I cannot figure out what. I am pasting my code here. Thank you so much for your help!!!

//----------------------------------------------------------------------------------------------------// //----------------------------------------------------------------------------------------------------//

include <mlpack/core.hpp>

include <mlpack/prereqs.hpp>

include <mlpack/methods/ann/rnn.hpp>

include <mlpack/methods/ann/layer/layer.hpp>

include <mlpack/core/data/scaler_methods/min_max_scaler.hpp>

include <mlpack/methods/ann/init_rules/he_init.hpp>

include <mlpack/methods/ann/loss_functions/mean_squared_error.hpp>

include <mlpack/core/data/split_data.hpp>

include

//Check that we have the correct version

if ((ENS_VERSION_MAJOR < 2) || ((ENS_VERSION_MAJOR == 2) && (ENS_VERSION_MINOR < 13)))

error "need ensmallen version 2.13.0 or later"

endif

using namespace std; using namespace mlpack; using namespace mlpack::ann; using namespace ens;

int main() {

const int rho = 3; size_t inputSize = 1; //Number of MEL coefficients size_t outputSize = 1; //Size of the output

const double RATIO = 0.1; //Ratio of the testing data const double STEP_SIZE = 5e-5; //Steo size of the optimizer

const int H1 = 50; const size_t BATCH_SIZE = 5; //Size of the batch const int EPOCHS = 500; //Maximum number of epochs for training the data

arma::cube trainX; trainX.set_size(inputSize,15,rho); //training data arma::cube trainY; trainY.set_size(inputSize,15,1); //labels

int track_val = 1;

for(int kk = 0; kk < 15; kk++) {

    for(int ll = 0; ll < 3; ll++)
    {

            trainX(0,kk,ll) = track_val;
            trainY(0,kk,0) += track_val;

            std::cout << trainX(0,kk,ll) << "\t";

            track_val++;

    }

    std::cout << trainY(0,kk,0) << "\n";

}

RNN<MeanSquaredError<>, HeInitialization> model(rho,true);

             model.Add<IdentityLayer<>>();
             model.Add<LSTM<>>(inputSize, H1, rho);

            // model.Add<Dropout<>>(0.2);
             model.Add<LeakyReLU<>>();
             model.Add<Linear<>>(H1,outputSize);

             ens::Adam optimizer(
                             STEP_SIZE,
                             BATCH_SIZE,
                             0.9,
                             0.999,
                             1e-8,
                             trainX.n_cols*EPOCHS,
                             1e-8,
                             true);

             optimizer.Tolerance() = -1;

             model.Train(
                            trainX,
                            trainY,
                            optimizer,
                            ens::PrintLoss(),
                            ens::ProgressBar(),
                            ens::EarlyStopAtMinLoss()
                    );

    //Testing
    arma::cube testY;
    testY.set_size(inputSize,outputSize,3);
    testY(0,0,0) = 50;
    testY(0,0,1) = 51;
    testY(0,0,2) = 52;

    arma::cube predOutP;        //3D matrix used to save the results obtained from the testing data
            model.Predict(testY, predOutP);

            std::cout << predOutP(0,0,0) << "\n";
            std::cout << predOutP(0,0,1) << "\n";
            std::cout << predOutP(0,0,2) << "\n";

} //----------------------------------------------------------------------------------------------------// //----------------------------------------------------------------------------------------------------//

I am adding the code that I have for keras, in case it can help find what I am doing wrong:

from numpy import array from keras.preprocessing.text import one_hot from keras.preprocessing.sequence import pad_sequences from keras.models import Sequential from keras.layers.core import Activation, Dropout, Dense from keras.layers import Flatten, LSTM from keras.layers import GlobalMaxPooling1D from keras.models import Model from keras.layers.embeddings import Embedding from sklearn.model_selection import train_test_split from keras.preprocessing.text import Tokenizer from keras.layers import Input from keras.layers.merge import Concatenate from keras.layers import Bidirectional

import pandas as pd import numpy as np import re

X = np.array([x+1 for x in range(45)]) X = X.reshape(15,3,1)

print(X)

Y = list() for x in X: Y.append(x.sum())

Y = np.array(Y) print(Y) model = Sequential() model.add(LSTM(50, activation='relu', input_shape=(3, 1))) model.add(Dense(1)) model.compile(optimizer='adam', loss='mse')

history = model.fit(X, Y, epochs=500, validation_split=0.2, verbose=1)

test_input = array([50,51,52]) test_input = test_input.reshape((1, 3, 1))

print(test_input.shape)

test_output = model.predict(test_input, verbose=0) print(test_output)

rcurtin commented 3 years ago

Hey @InterTriplete2010, I believe that our RNN responses require a response for each individual time step. So, in your example above, if the input point is [1 2 3] and your intention is that the RNN returns 6 at the end of that sequence, try a responses Cube with 3 slices, where the responses for that point are [1 3 6]. (In essence, at each time step, the response is the sum of the input seen so far. You could also, if you only wanted the RNN to respond at the end of the sequence, make the response [0 0 6], but my intuition is that providing the partial sum at each time step would work better.)

So, the shape of responses would be 1 row by 15 columns by 3 slices (e.g., 1 dimension, 15 samples, 3 time steps).

I believe that that will work... want to give it a try and see what happens?

InterTriplete2010 commented 3 years ago

Hey @InterTriplete2010, I believe that our RNN responses require a response for each individual time step. So, in your example above, if the input point is [1 2 3] and your intention is that the RNN returns 6 at the end of that sequence, try a responses Cube with 3 slices, where the responses for that point are [1 3 6]. (In essence, at each time step, the response is the sum of the input seen so far. You could also, if you only wanted the RNN to respond at the end of the sequence, make the response [0 0 6], but my intuition is that providing the partial sum at each time step would work better.)

So, the shape of responses would be 1 row by 15 columns by 3 slices (e.g., 1 dimension, 15 samples, 3 time steps).

I believe that that will work... want to give it a try and see what happens?

Hi @rcurtin, thank you for your response. Unfortunately, it doesn't seem to be working. I tried both ways ([0 0 6] and [1 3 6]), but no luck. I hope I am not doing something stupid, even though the input and response cube appear to be correct to me.

Something that I was reading in the documentation of RNN (I hope I didn't misinterpreted the parameters of RNN) is that it should be possible to predict only the last output by setting "single" to true: RNN<MeanSquaredError<>, HeInitialization> model(rho,true); in the line of code that I posted. (https://mlpack.org/doc/mlpack-3.1.0/doxygen/classmlpack_1_1ann_1_1RNN.html#aa07a59fdcbe988264200c8f593c73bbf). But when I did it, I could not get any meaningful result.

Also, my final goal is to apply LSTM to MEL coefficients. The reason I decided to run this simple example was to figure out if I was doing something wrong with the setup of the MEL coefficients, since I was not able to get any meaningful result. So ideally, I would really need to predict only the last output.

Anything else that am I missing here or possibly doing wrong? Do you have any example of RNN applied to this type of problems with mlpack?

Thank you!!! Alex.

rcurtin commented 3 years ago

Ahh, sorry that my example didn't work. Here are a couple of examples of RNNs being used in mlpack:

https://github.com/mlpack/mlpack/blob/master/src/mlpack/tests/rnn_reber_test.cpp https://github.com/mlpack/examples/blob/master/lstm_stock_prediction/lstm_stock_prediction.cpp https://github.com/mlpack/examples/blob/master/lstm_electricity_consumption/lstm_electricity_consumption.cpp

I'm not sure if any of those use single mode. Anyway, I am not the biggest expert on the RNN code (I was guessing in the last response), but it seems that you can use single mode by having only a single slice in your responses. So, in your situation, your responses would have 1 row, 15 columns, and 1 slice. (So, if the slices of the first column of the input data were [1, 2, 3], then the only slice of the response data for the first column would be [6].)

Maybe give that a shot and see if that helps?

InterTriplete2010 commented 3 years ago

Unfortunately I have already looked at these examples and none of them use a single mode. And I have already tried to use a response with 1 row, 15 columns, and 1 slice (That was the example that I posted). That didn't return any good result. Additionally, when I test the model, it still returns 3 output, instead of one. I really hope this can be done with mlpack, because I really don't want to switch to python.

rcurtin commented 3 years ago

I played with it and wrote this simple example code, which seems to work:

#include <mlpack/core.hpp>
#include <mlpack/methods/ann/rnn.hpp>
#include <mlpack/methods/ann/loss_functions/mean_squared_error.hpp>
#include <mlpack/methods/ann/init_rules/he_init.hpp>

using namespace ens;
using namespace mlpack::ann;

int main()
{
  arma::cube inputData(1, 3, 3);
  inputData.slice(0) = arma::mat("1 2 4");
  inputData.slice(1) = arma::mat("2 2 5");
  inputData.slice(2) = arma::mat("3 3 6");

  arma::cube responses(1, 3, 1);
  responses.slice(0) = arma::mat("6 7 15");

  // Build RNN in single mode.
  RNN<MeanSquaredError<>, HeInitialization> model(3, true);
  model.Add<IdentityLayer<>>();
  model.Add<LSTM<>>(1, 5 /* 5 cells */, 3);
  model.Add<LeakyReLU<>>();
  model.Add<Linear<>>(5, 1);

  // Train for 1000 epochs...
  ens::Adam opt(0.1, 1, 0.9, 0.999, 1e-8, 3000, 1e-8, true);

  model.Train(inputData, responses, opt, ens::ProgressBar());

  arma::cube predictions;

  model.Predict(inputData, predictions);

  std::cout << "Predictions:\n" << predictions;
}

In that case, I set things up exactly like your problem (but with only 3 data points, not 15), and it seems to work. I'm sure the network needs additional tuning to actually perform well---the predictions didn't look great (but they at least went the right direction---the output gets greater for each successive input).

Anyway, maybe you can adapt that example to your case? Or if you are still having trouble, maybe the issue is not the input shape? I'm sure we can get it narrowed down. :+1:

InterTriplete2010 commented 3 years ago

Thank you for sharing this code with me. Yes, it seems to be identical to mine. As a matter of fact the results are similar, that is the prediction output increases. As you increase the number of neurons of the LSTM layer (I tried up to 1000), results get better too, but they are still far away from the correct prediction and the validation loss is obviously pretty high. However, what worries me is that if I try to use the exact same parameters in python (I replicated my code and yours in python using Keras) I get an extremely accurate prediction and the validation loss is also in the a good range. For instance, if I try to predict [50 51 52] I get ~154 in python. And I also get a single output, while in mlpack I get 3 outputs per sample. I honestly think that the format of the input and response are correct, at least based on the documentation. But the LSTM doesn't seem to be able to get trained. I assume the algorithm is exactly the same that is used in python, so it should return similar results. I am sure there must be a parameter in the RNN that I am not setting up correctly, but I cannot figure out which one. I tried several things, but none of them seem to be working

rcurtin commented 3 years ago

I'm not sure the network structure I used there is the best for this task, but I agree that if the RNN is in single mode that we should get back only one prediction. So I wonder if maybe there is a bug there; but unfortunately the internal implementation of RNN is not something I know closely, so I am not sure what is going on.

zoq commented 3 years ago

Somehow the issue must have fallen under my table, I'll have to take a closer look at the code, in the meantime do you mind to share the keras example code you used as comparison?

InterTriplete2010 commented 3 years ago

Here it is. Thank you both for your help.

from numpy import array from keras.preprocessing.text import one_hot from keras.preprocessing.sequence import pad_sequences from keras.models import Sequential from keras.layers.core import Activation, Dropout, Dense from keras.layers import Flatten, LSTM from keras.layers import GlobalMaxPooling1D from keras.models import Model from keras.layers.embeddings import Embedding from sklearn.model_selection import train_test_split from keras.preprocessing.text import Tokenizer from keras.layers import Input from keras.layers.merge import Concatenate from keras.layers import Bidirectional

import pandas as pd import numpy as np import re

X = np.array([x+1 for x in range(45)]) X = X.reshape(15,3,1)

print(X)

Y = list() for x in X: Y.append(x.sum())

Y = np.array(Y) print(Y) model = Sequential() model.add(LSTM(50, activation='relu', input_shape=(3, 1))) model.add(Dense(1)) model.compile(optimizer='adam', loss='mse')

history = model.fit(X, Y, epochs=500, validation_split=0.2, verbose=1)

test_input = array([50,51,52]) test_input = test_input.reshape((1, 3, 1))

print(test_input.shape)

test_output = model.predict(test_input, verbose=0) print(test_output)

zoq commented 3 years ago

Thanks for the code!

zoq commented 3 years ago

Just a quick update, I started to look into the issue, but it will probably be over the weekend before I can provide a solution here.

InterTriplete2010 commented 3 years ago

Hi @zoq, just wondering if you got a chance to look at the code. No pressure at all. I am just eager to try it, when you are done :-) Thank you so much again for your help.

mlpack-bot[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! :+1: