Closed xinsuinizhuan closed 4 years ago
I found the issue, working on it now.
It seems that Visual Studio has difficulty parsing functions that return functors or lambdas that accept a variable number of parameters.
Will apply a fix soon
It works, thank you! I look forward to lstm support!
Awesome so glad. I am working on LSTM and GRUs now.
mnist_test_recuttent, run the newest code, appear -nan and i don't konw why feed the data, as */4
So the test feeds in 1/4 of the images at a time. That way the neural network has to remember the past inputs to get the output correct.
I'll try to see if there are any bugs.
You can see the example output here (and the test outputs). https://travis-ci.org/josephjaspers/BlackCat_Tensors
So it is working on Travis and on my configuration. I'll try lowering the learning rate and seeing if that works.
I...
Added zero initialization (maybe some uninitialized value was causing the nans?) Lowed the learning rate (maybe the initial random-seed / initial weights de-converged)? https://github.com/josephjaspers/BlackCat_Tensors/commit/017e82da2ae6b0077cc0a956139463bd05d0e623
Its very difficult to debug, I have this working on my home configuration and on the travis build. This last update should cover some of the possible causes.
I...
Added zero initialization (maybe some uninitialized value was causing the nans?) Lowed the learning rate (maybe the initial random-seed / initial weights de-converged)? 017e82d
Its very difficult to debug, I have this working on my home configuration and on the travis build. This last update should cover some of the possible causes.
I am so sorry, It still has nans
In mnist_test.h
auto network = neuralnetwork(
feedforward(system_tag, 784/4, 64),
tanh(system_tag, 64),
lstm(system_tag, 64, 32),
feedforward(system_tag, 32, 10),
softmax(system_tag, 10),
outputlayer(system_tag, 10)
);
network.set_learning_rate(0.001);
network.set_batch_size(batch_size);
Can you try with a smaller learning rate? Maybe 0.001, 0.0001?
And maybe decreasing the number of epochs to 3 or 5? That way you can test if it is still training and what values are causing it to result in nans.
Does the regular mnist_test (the feedforward version) still work? I have it working on Windows10 with MKL, Travis, (Ubuntu 18), and Ubuntu 16.04 (ATLAS/Cuda), so I am unsure what could be the issue.
Yes, I test it, set_learning_rate 0.001 and 0.0001, and set epochs to 3 and 5, but it still nans. But the mnist_test (the feedforward version) still work.
How to print the mat,vec and mat_view value? I debug it ,but can not view the value timely!
In mnist_test.h
auto network = neuralnetwork( feedforward(system_tag, 784/4, 64), tanh(system_tag, 64), lstm(system_tag, 64, 32), feedforward(system_tag, 32, 10), softmax(system_tag, 10), outputlayer(system_tag, 10) ); network.set_learning_rate(0.001); network.set_batch_size(batch_size);
Can you try with a smaller learning rate? Maybe 0.001, 0.0001?
And maybe decreasing the number of epochs to 3 or 5? That way you can test if it is still training and what values are causing it to result in nans.
Does the regular mnist_test (the feedforward version) still work? I have it working on Windows10 with MKL, Travis, (Ubuntu 18), and Ubuntu 16.04 (ATLAS/Cuda), so I am unsure what could be the issue.
When i debug it, in first epochs, network.update_weights(), i find error, line 243 in lstm.h, it is -6.274e+66, it is too small, i gues it cuase the nans,
How to print the mat,vec and mat_view value? I debug it ,but can not view the value timely!
You can use tensor.print()
to output to console.
You can also use tensor.to_string()
to get a string version of the output.
How to print the mat,vec and mat_view value? I debug it ,but can not view the value timely!
You can use
tensor.print()
to output to console. You can also usetensor.to_string()
to get a string version of the output. The tensor.print() could only print the OutputLayer forward and back, others could not. I user tensor.print() to print the output, the firsrt epoch is normal. But when i look the update_weights, in list update_weights bz is too small, that cause the nans.
This may be confusing... I will try to make it more clear in the future.
auto& layer = network.template get_layer<0>(); //get the first layer
std::vector<mat>& cache = layer.get_cache(std::true_type());
cache[0].print(); //print the last output (or intermediate value)
for (int i = 0; i < epochs; ++i){
std::cout << " current epoch: " << i << std::endl;
for (int j = 0; j < samples/batch_size; j++) {
//Feed 1/4th of the image at a time
network.forward_propagation(chunk(inputs[j], BC::index(0, 0), BC::shape(784/4, batch_size)));
network.forward_propagation(chunk(inputs[j], BC::index(784* 1/4, 0), BC::shape(784/4, batch_size)));
network.forward_propagation(chunk(inputs[j], BC::index(784* 2/4, 0), BC::shape(784/4, batch_size)));
network.forward_propagation(chunk(inputs[j], BC::index(784* 3/4, 0), BC::shape(784/4, batch_size)));
network.template get_layer<1>().get_cache(std::true_type())[0].print(); //Print the last output of the LSTM layer
//Apply backprop on the last two images (images 3/4 and 4/4)
network.back_propagation(outputs[j]);
network.back_propagation(outputs[j]);
network.update_weights();
}
}`
`
What may be easier is just going to LSTM.h
template<class X>
auto forward_propagation(const X& x) {
if (cs.empty()) {
cs.push_back(c);
}
t_minus_index = 0;
fs.push_back( f_g(wf * x + bf) );
is.push_back( i_g(wi * x + bi) );
zs.push_back( z_g(wz * x + bz) );
os.push_back( o_g(wo * x + bo) );
auto& f = fs.back();
auto& z = zs.back();
auto& i = is.back();
auto& o = os.back();
c %= f; //%= element-wise assign-multiplication
c += z % i; //% element-wise multiplication
cs.push_back(c);
(c_g(c) % o).print(); // Add this to print the output
return c_g(c) % o;
}
And from here you can print all of the intermediate values.
I will try to add better accessors to make debug easier. I am currently working on the save/load feature.
OK!Let me to debug the nans problem!You can focus on the save/load feature.
I find the problem.When i debug it, the bf has -nans value, in function forward_propagation, in lstm.h file. So i set init them in lstm construction function: bf.randomize(-1, 1); bz.randomize(-.1, .1); bi.randomize(-.1, .1); bo.randomize(0, .5); then, it works, but the accuracy rate is low, i don't know, whether it is right that i add. this is snapshot, bf has -nans: I add the init: the accuracy rate is low:
others it can spport to print the MAPE (and MAE, MSE, RMSE, NRMSE) every epoch?
I should have randomized the biases! Thank you so much that was a bug in my code!
Yes I will add a ticket for those other features as well.
Update: Added the randomization to biases here: 4ea0d47 "LSTM add biases randomization, special thanks to xinsuinizhuan for debugging!"
I find the problem.When i debug it, the bf has -nans value, in function forward_propagation, in lstm.h file. So i set init them in lstm construction function: bf.randomize(-1, 1); bz.randomize(-.1, .1); bi.randomize(-.1, .1); bo.randomize(0, .5); then, it works, but the accuracy rate is low, i don't know, whether it is right that i add.
Yes, the learning rate for the lstm is very low. Increasing the number of epochs or increasing the learning rate would improve performance. I keep it at a low level to prevent it from becoming nans. The low accuracy is expected.
And it was correct that you added them in the init!
Thank you, you are so great! It works,i set epoch to 100, and the performance is better. But, i want to get a better performance, and i set epoch to 1000, then it appears -nans, i guest it is overfitting, i think it should be limited, to set a arrived accuracy, when it achieve epochs or achieve the accuracy,stop the train, or and the dropout to prevent overfitting.
My goal is to use lstm to forecast the Electric-Power-Short-Term-Load,i have the every 15 minutes Electric-Power-Load value every day, so i want use the history data to forecast。I have two model to forecast,use the history data to forecast the next day the every 15 minutes value (cnn+lstm). I have test two methods, first is : input is one dimensionality(only date), output is 96 dimensionality(every 15 minutes of a day), data like the uploader: ACCRTU0001001_15min.zip second is: input is 24 dimensionality(befor the 24 days data, 24""96), output is one dimensionality(the next day data, 1""96), data like the uploader: Germany_BZNDE_LU_20181001-20190722.zip But the two method' s accuracy rate and MAPE is low, as the forecast like https://github.com/YuanSiping/Short-Term-Load-Forecasting-for-Electric-Power-Systems, it use the before 24 minutes to forecast the next minutes value, the mape is so greate. Could you give me some suggests? What i should do? How to use your lstm modle to forcast?
Thank you, you are so great! It works,i set epoch to 100, and the performance is better. But, i want to get a better performance, and i set epoch to 1000, then it appears -nans, i guest it is overfitting, i think it should be limited, to set a arrived accuracy, when it achieve epochs or achieve the accuracy,stop the train, or and the dropout to prevent overfitting.
I just added 'save' and 'load' so you could possible save every few epochs and than choose whichever one had the best performance.
I am trying to improve save and load and than I will be working on adding logging for MAPE (and MAE, MSE, RMSE, NRMSE)
------------------_ I would start by normalizing the data (putting it into range of [0,1]), and trying a simple LSTM/RNN model. LSTM -> FeedForward -> Logistic
You could attempt to predict how much the voltage changes opposed to its actual value. (And train on the change of the value opposed to the actual value). That way the input data/output data is always relative (and in a smaller range).
I will have to look at the data more to give a better answer though.
I see the data is in CSV format. You can load csv's with:
#include "BlackCat_IO.h"
int main() {
BC::Matrix<double> csv =
BC::io::read_uniform<double>(BC::io::csv_descriptor("data.csv").header(true));
}
Thank you,very much! First,I will create a server to train different company's electricity power load, and to forecast them by themself's trained saved modle automaticly. The different company, the data is different, so it is diffcult to choose the best performance's epochs one by one, then we should think about how to prevent the overfitting, to let it stop the train automatically, when it achieve epochs or achieve the accuracy . Second, yes, my date is the voltage changes, is the actual electricity consumption。I will read the csv file, and normalize the data to [0,1], which is better to use max-min or tanh to normalize them?
I believe (x - min) / (max - min)
would be better than tanh
. (As large values in tanh would all result in the same value).
About how to prevent overfitting, you can refer to this:https://github.com/Artelnics/OpenNN.
I set the epoch to 200, it apear:[-nan(ind, -nan(ind, -nan(ind, -nan(ind, -nan(ind, -nan(ind, -nan(ind, -nan(ind, -nan(ind, -nan(ind]. It seems not good, how to prevent the overfitting.Because different data, the best epoch is different, to choose it artificial is too difficult. shoul be to deal with.
I ran the test with 200 epochs and I didn't have any nans.
However you can simply save the neural network every few epochs so that you can load back the most accurate epoch.
I am still working on adding appropriate error logging.
for (int i = 0; i < epochs; ++i){
std::cout << " current epoch: " << i << std::endl;
for (int j = 0; j < samples/batch_size; j++) {
//Feed 1/4th of the image at a time
network.forward_propagation(chunk(inputs[j], BC::index(0, 0), BC::shape(784/4, batch_size)));
network.forward_propagation(chunk(inputs[j], BC::index(784* 1/4, 0), BC::shape(784/4, batch_size)));
network.forward_propagation(chunk(inputs[j], BC::index(784* 2/4, 0), BC::shape(784/4, batch_size)));
network.forward_propagation(chunk(inputs[j], BC::index(784* 3/4, 0), BC::shape(784/4, batch_size)));
//Apply backprop on the last two images (images 3/4 and 4/4)
network.back_propagation(outputs[j]);
network.back_propagation(outputs[j]);
network.update_weights();
}
if (i%10 == 0) {
network.save("mnist_recurrent_epoch" + std::to_string(i));
}
}
To load the saved weights...
network.load("mnist_recurrent_epoch" + std::to_string(best_epoch));
Closing as issue has been resolved.
windows 10, vs2019, openblas, compile mnist_test_recurrent apears some errors.