Closed chase-metzger closed 4 years ago
Hi @chase-metzger, sorry for the slow response. I think you are right that it is a data encoding issue. The RNN::Train()
function takes an arma::cube
where
each slice should correspond to a time step each column should correspond to a data point each row should correspond to a dimension So, e.g., predictors(i, j, k) is the i'th dimension of the j'th data point at time slice k.
It looks to me like you code has the sequence length as the number of rows, and the input size as the number of slices. So I think that if you can adapt your example so that the cube
shape is the following:
cube result(numLetters, 1, strLen);
then the code should work. :+1:
P.S. you could do cube result(numLetters, 1, strLen, fill::zeros)
to avoid the zeros()
call. :)
@rcurtin Hi thanks for the response! (and sorry for my late one)
Unfortunately, changing all the encoding to be that shape did not work. I'm still getting an out of bounds
for the operator()
operator.
I ran a debugger and before it crashes, the network is in the process of forwarding to the NegativeLogLikelihood
which is the output layer. To be a little more specific, the crash occurs on line
39 of the negative_log_likelihood_impl.hpp
file, which looks like this: (it really crashes in the Mat_meat file but I thought this info was relevant)
output -= input(currentTarget, i);
What I'm finding odd is that i
in the above line is 0 in the debugger. So it's crashing on the first iteration?
Ah, thanks for the details. I think that the encoding you had before was incorrect, so, maybe there were two problems. :) Anyway, thanks for working with a debugger a little bit to figure out what the issue was.
I think that this is an issue of labels. What are you passing to Train()
as your responses? For NegativeLogLikelihood
, this should be the desired response at each time step, from 1 to the number of classes (not 0 to the number of classes minus 1). I think in your case the number of classes is numLetters
? I'm just guessing about that though.
Can you show the code you are using to make the responses?
This is how numLetters is defined
const int numLetters = 256; //The number of characters in an ascii table
...
And this is the current implementation of the responses function
You are correct about the numLetters
, I'm basically trying to get the network
to output the next letter in a sequence from the learned model. So I was following that
tutorial and it seemed like the number of classes should be the number of letters that it can possibly output?
const auto makeTarget = [numLetters] (const char *line) -> cube {
const auto strLen = strlen(line);
cube result(1, strLen, 1, fill::zeros);
for(int i = 0; i < strLen; ++i)
{
const auto letter = line[i];
const auto letterIndex = static_cast<double>(letter + 1); //I just added the plus one to test to see if I didn't start at 0, since that's where ascii starts, it would work.
result.at(0, i, 0) = letterIndex;
}
return result;
};
How exactly how are the responses supposed to be shaped?
For the NegativeLogLikelihood
response, the responses should have basically the same shape as the predictors---so, since you are predicting individual letters, it should have shape like this: cube responses(1, 1, strLen)
(assuming there is only one data point). So, then, responses(1, 1, i)
should be the integer-valued response (from 1 to numLetters
) for the i
'th element in the string. (Essentially, we are not one-hot encoding the response there, just passing in the single integer value that represents the result's class.)
I think that should work... give it a try and let me know what happens. :)
Ok, so I've tried making the shape of the responses and encoding as described.
It's giving me a error: matrix multiplication: incompatible matrix dimensions: 128x256 and 512x32
Where I believe that from the debugger I've figure the 256 is the number of classes, 128 is the hiddenSize
and I think the 32 is the batch size of the LSTM
layer.
What happens if you remove the Dropout
layer? I looked through the LSTM
code and based on what you wrote in the initial post and what I see in there, I think the output size from that layer should be 256x32
. So if that is actually 512x32
, I'm not totally sure where that issue is coming from.
Anyway, I'm happy to try and debug it but I want to make sure I'm using the same code as you... if you can provide the whole code that you're using so I can compile and step through it and see what's wrong, I'm happy to. :+1:
@rcurtin Thanks. Unfortunately, I’m in California during the power shut offs and fires. So I won’t be able to upload a repo until possibly Friday.
I did try removing the Dropout layer before losing power. I think it gave a different result that was closer to what you describe. Also the 512, was my mistake. I tried changing one variable from 256 to 512 before posting that, just to see what would happen.
I’ll get a file uploaded soon. One slight issue is I’m using Qt for some of the string/file parsing, I load my training data from a CSV file and then convert everything to plain C++ strings. But I can easily remove that and then pass in test data.
Heres a gist that has all the main implementation minus all the file loading.
Currently, it gives error: matrix multiplication: incompatible matrix dimensions: 128x256 and 1x1
Which seems really odd to me that there's a 1x1 that gets passed, I assume that's the part of the
target I constructed.
https://gist.github.com/chase-metzger/cd3b9cc41796c42dfecc015e8732d0da
--Thanks for all the help
Hey @chase-metzger,
I took a dive into the code and saw some things that could be the issue. I did manage to make it work. I'll go through each bit:
An LSTM with an FFN doesn't make too much sense, so I'd suggest switching back to the RNN.
An IdentityLayer<>
is required as the first layer of an RNN if an LSTM is used for the first layer. This is so that backpropagation through time works correctly.
The code that you had would create a training set with just the letter T
, not the sample input THIS IS THE INPUT
. I rewrote it below. There may be a few other changes too:
std::vector<std::string> trainingData;
trainingData.push_back(std::string("THIS IS THE INPUT"));
const auto makeInput = [](const char *line) -> MatType {
const auto strLen = strlen(line);
// rows: number of dimensions
// cols: number of sequences/points
// slices: number of steps in sequences
MatType result(numLetters, 1, strLen, fill::zeros);
for(int i = 0; i < strLen; ++i)
{
const auto letter = line[i];
result.at(static_cast<uword>(letter), 0, i) = 1.0;
}
return result;
};
const auto makeTarget = [] (const char *line) -> MatType {
const auto strLen = strlen(line);
// responses for NegativeLogLikelihood should be
// non-one-hot-encoded class IDs (from 1 to num_classes)
cube result(1, 1, strLen, fill::zeros);
// the response is the *next* letter in the sequence
for(int i = 0; i < strLen - 1; ++i)
{
const auto letter = line[i + 1];
result.at(0, 0, i) = static_cast<uword>(letter) + 1.0;
}
// the final response is empty, so we set it to class 0
result.at(0, 0, strLen - 1) = 1.0;
return result;
};
std::vector<cube> inputs(trainingData.size());
std::vector<cube> targets(trainingData.size());
for(int i = 0; i < trainingData.size(); ++i)
{
inputs[i] = makeInput(trainingData[i].c_str());
targets[i] = makeTarget(trainingData[i].c_str());
}
The Linear<>
layer should have hiddenSize
as its input size, since hiddenSize
is the output size of the previous LSTM<>
layer.
I'd suggest setting rho
to maxLineLength
, as in RNN<> rnn(maxLineLength);
and rnn.Add<LSTM<>>(numLetters, hiddenSize, maxLineLength);
(but make sure maxLineLength
is the maximum line length... so in this case, 17).
The default optimizer configuration will be pretty slow for this example. You might try a simpler example, kind of like this:
ens::SGD<> sgd(0.01, 1, 100 /* only 100 maximum iterations, just to see it work */);
rnn.Train(inputs[0], targets[0], sgd);
So, overall, this is the code I have working. I also removed a little bit of code that became unnecessary.
#include <mlpack/core/cv/cv_base.hpp>
#include <mlpack/core/cv/metrics/accuracy.hpp>
#include <mlpack/core/cv/metrics/precision.hpp>
#include <mlpack/core/cv/metrics/mse.hpp>
#include <mlpack/core/cv/k_fold_cv.hpp>
#include <mlpack/methods/ann/rnn.hpp>
#include <mlpack/methods/ann/ffn.hpp>
#include <mlpack/methods/ann/layer/lstm.hpp>
#include <mlpack/methods/ann/layer/dropout.hpp>
#include <mlpack/methods/random_forest/random_forest.hpp>
#include <mlpack/methods/decision_tree/random_dimension_select.hpp>
#include <mlpack/core/arma_extend/arma_extend.hpp>
#include <vector>
#include <string>
int main(int argc, char *argv[])
{
using namespace mlpack;
using namespace mlpack::data;
using namespace mlpack::tree;
using namespace mlpack::cv;
using namespace mlpack::ann;
using namespace arma;
// We have to make sure backpropagation through time doesn't take more
// time steps than we have.
const int maxLineLength = 17;
const int hiddenSize = 128;
const int numLetters = 256;
using MatType = cube;
std::vector<std::string> trainingData;
trainingData.push_back(std::string("THIS IS THE INPUT"));
//This is the orignal network that I used (FFN) but then I also tried a RNN
RNN<> rnn(maxLineLength);
rnn.Add<IdentityLayer<>>();
rnn.Add<LSTM<>>(numLetters, hiddenSize, maxLineLength);
rnn.Add<Dropout<>>(0.1);
rnn.Add<Linear<>>(hiddenSize, numLetters);
const auto makeInput = [](const char *line) -> MatType {
const auto strLen = strlen(line);
// rows: number of dimensions
// cols: number of sequences/points
// slices: number of steps in sequences
MatType result(numLetters, 1, strLen, fill::zeros);
for(int i = 0; i < strLen; ++i)
{
const auto letter = line[i];
result.at(static_cast<uword>(letter), 0, i) = 1.0;
}
return result;
};
const auto makeTarget = [] (const char *line) -> MatType {
const auto strLen = strlen(line);
// responses for NegativeLogLikelihood should be
// non-one-hot-encoded class IDs (from 1 to num_classes)
cube result(1, 1, strLen, fill::zeros);
// the response is the *next* letter in the sequence
for(int i = 0; i < strLen - 1; ++i)
{
const auto letter = line[i + 1];
result.at(0, 0, i) = static_cast<uword>(letter) + 1.0;
}
// the final response is empty, so we set it to class 0
result.at(0, 0, strLen - 1) = 1.0;
return result;
};
std::vector<cube> inputs(trainingData.size());
std::vector<cube> targets(trainingData.size());
for(int i = 0; i < trainingData.size(); ++i)
{
inputs[i] = makeInput(trainingData[i].c_str());
targets[i] = makeTarget(trainingData[i].c_str());
}
ens::SGD<> sgd(0.01, 1, 100 /* only 100 maximum iterations, just to see it work */);
rnn.Train(inputs[0], targets[0], sgd);
return 0;
}
There are a few issues that I noticed here that are a little bit problematic; I can see that this was a little bit confusing, and I think we should work to improve that. So I think that I will open some issues and link them to this one, and we can see how things go from there. :)
I opened #2070 and #2071, and also #1267 and the related PR #1366 are relevant here---I think support like that would have helped the network be the right size.
Wow, you have gone above and beyond to help me( relative to many other opensource projects)
Maybe I can make a few suggests to docs in the future (one for now)
IdentityLayer
has to be the first layer of an RNN
if LSTM
is used
(It makes sense though)Everything you describe makes sense. I was so lost in the matrix ops in the beginning, compared to numpy
or pytorch
. I stoppped paying attention to how I was not properly following how to encode the data from the tutorial. I completely forgot that it's the next character in the sequence, not what it's currently predicting.
I'm gonna start parsing the code closer. I'll start where you suggest from the issues and work out from there.
Thanks so much. I can't wait to actually make what I set out to do. Make a GUI for looking at parts of the network (mostly tables and lists for displaying the matrices)
Thanks again
Happy to help!
You're right that the documentation could be improved; I opened #2080 as an attempt, and it should hopefully help. Feel free to comment on that if you have any suggestions.
It's definitely true that the matrix operations are a little bit different with Armadillo and C++ than they would be in other packages. It's meant to be like the MATLAB syntax, but that's definitely a bit different than what it feels like in Python. :)
Excited to hear about how the GUI goes; if you have any more questions, we're happy to try to answer them as best we can.
This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! :+1:
Hi, I just started using mlpack about two week ago. Everything, for the most part, is going smoothly with using it. I learned most ML concepts using either TensorFlow or PyTorch before finding mlpack and using its tensor api.
I previously had used this article: https://towardsdatascience.com/writing-like-shakespeare-with-machine-learning-in-pytorch-d77f851d910c
My network/model is constructed as such:
The main part I'm having trouble with, it seems, is reimplementing the tensor operations to properly pass in the inputs to the net.Train function. The operations all run and print the expected result but the Train function for the network throws a
logic_error
as a result ofout of bounds
error. I assume I'm not encoding the input right.And example of how I'm encoding the input (What I thought was the equivalent in mlpack) is below:
Any help with getting me unstuck from this would be greatly appreciated.
Thanks