josephjaspers / blackcat_tensors

Matrix-Vector Library Designed for Neural Network Construction. cuda (gpu) support, openmp (multithreaded cpu) support, partial support of BLAS, expression template based implementation PTX code generation identical to hand written kernels, and support for auto-differentiation
12 stars 4 forks source link

mnist_test_recurrent compile errors! #23

Closed xinsuinizhuan closed 4 years ago

xinsuinizhuan commented 4 years ago

windows 10, vs2019, openblas, compile mnist_test_recurrent apears some errors. 1565059600(1)

josephjaspers commented 4 years ago

I found the issue, working on it now.

It seems that Visual Studio has difficulty parsing functions that return functors or lambdas that accept a variable number of parameters.

Will apply a fix soon

josephjaspers commented 4 years ago

Fixed as of:

https://github.com/josephjaspers/BlackCat_Tensors/commit/e338ecea34603675956035be27d68e006bc9edc8

https://github.com/josephjaspers/BlackCat_Tensors/commit/55c951c40d2507b0fa20b0499b84ff69c3fb7840

xinsuinizhuan commented 4 years ago

It works, thank you! I look forward to lstm support!

josephjaspers commented 4 years ago

Awesome so glad. I am working on LSTM and GRUs now.

xinsuinizhuan commented 4 years ago

mnist_test_recuttent, run the newest code, appear -nan 1565314937(1) and i don't konw why feed the data, as */4 1565315225(1)

josephjaspers commented 4 years ago

So the test feeds in 1/4 of the images at a time. That way the neural network has to remember the past inputs to get the output correct.

I'll try to see if there are any bugs.

You can see the example output here (and the test outputs). https://travis-ci.org/josephjaspers/BlackCat_Tensors

So it is working on Travis and on my configuration. I'll try lowering the learning rate and seeing if that works.

josephjaspers commented 4 years ago

I...

Added zero initialization (maybe some uninitialized value was causing the nans?) Lowed the learning rate (maybe the initial random-seed / initial weights de-converged)? https://github.com/josephjaspers/BlackCat_Tensors/commit/017e82da2ae6b0077cc0a956139463bd05d0e623

Its very difficult to debug, I have this working on my home configuration and on the travis build. This last update should cover some of the possible causes.

xinsuinizhuan commented 4 years ago

I...

Added zero initialization (maybe some uninitialized value was causing the nans?) Lowed the learning rate (maybe the initial random-seed / initial weights de-converged)? 017e82d

Its very difficult to debug, I have this working on my home configuration and on the travis build. This last update should cover some of the possible causes.

I am so sorry, It still has nans 1565571477(1)

josephjaspers commented 4 years ago

In mnist_test.h

auto network = neuralnetwork(
    feedforward(system_tag, 784/4, 64),
    tanh(system_tag, 64),
    lstm(system_tag, 64, 32),
    feedforward(system_tag, 32, 10),
    softmax(system_tag, 10),
    outputlayer(system_tag, 10)
);
network.set_learning_rate(0.001);
network.set_batch_size(batch_size);

Can you try with a smaller learning rate? Maybe 0.001, 0.0001?

And maybe decreasing the number of epochs to 3 or 5? That way you can test if it is still training and what values are causing it to result in nans.

Does the regular mnist_test (the feedforward version) still work? I have it working on Windows10 with MKL, Travis, (Ubuntu 18), and Ubuntu 16.04 (ATLAS/Cuda), so I am unsure what could be the issue.

xinsuinizhuan commented 4 years ago

Yes, I test it, set_learning_rate 0.001 and 0.0001, and set epochs to 3 and 5, but it still nans. But the mnist_test (the feedforward version) still work.

xinsuinizhuan commented 4 years ago

How to print the mat,vec and mat_view value? I debug it ,but can not view the value timely!

xinsuinizhuan commented 4 years ago

In mnist_test.h

auto network = neuralnetwork(
  feedforward(system_tag, 784/4, 64),
  tanh(system_tag, 64),
  lstm(system_tag, 64, 32),
  feedforward(system_tag, 32, 10),
  softmax(system_tag, 10),
  outputlayer(system_tag, 10)
);
network.set_learning_rate(0.001);
network.set_batch_size(batch_size);

Can you try with a smaller learning rate? Maybe 0.001, 0.0001?

And maybe decreasing the number of epochs to 3 or 5? That way you can test if it is still training and what values are causing it to result in nans.

Does the regular mnist_test (the feedforward version) still work? I have it working on Windows10 with MKL, Travis, (Ubuntu 18), and Ubuntu 16.04 (ATLAS/Cuda), so I am unsure what could be the issue.

When i debug it, in first epochs, network.update_weights(), i find error, line 243 in lstm.h, it is -6.274e+66, it is too small, i gues it cuase the nans, 1565659890(1)

josephjaspers commented 4 years ago

How to print the mat,vec and mat_view value? I debug it ,but can not view the value timely!

You can use tensor.print() to output to console. You can also use tensor.to_string() to get a string version of the output.

xinsuinizhuan commented 4 years ago

How to print the mat,vec and mat_view value? I debug it ,but can not view the value timely!

You can use tensor.print() to output to console. You can also use tensor.to_string() to get a string version of the output. The tensor.print() could only print the OutputLayer forward and back, others could not. I user tensor.print() to print the output, the firsrt epoch is normal. But when i look the update_weights, in list update_weights bz is too small, that cause the nans.

josephjaspers commented 4 years ago

This may be confusing... I will try to make it more clear in the future.

auto& layer = network.template get_layer<0>(); //get the first layer
std::vector<mat>&  cache = layer.get_cache(std::true_type());
cache[0].print();  //print the last output (or intermediate value)
for (int i = 0; i < epochs; ++i){
    std::cout << " current epoch: " << i << std::endl;
    for (int j = 0; j < samples/batch_size; j++) {

        //Feed 1/4th of the image at a time
        network.forward_propagation(chunk(inputs[j], BC::index(0,        0), BC::shape(784/4, batch_size)));
        network.forward_propagation(chunk(inputs[j], BC::index(784* 1/4, 0), BC::shape(784/4, batch_size)));
        network.forward_propagation(chunk(inputs[j], BC::index(784* 2/4, 0), BC::shape(784/4, batch_size)));
        network.forward_propagation(chunk(inputs[j], BC::index(784* 3/4, 0), BC::shape(784/4, batch_size)));

        network.template get_layer<1>().get_cache(std::true_type())[0].print(); //Print the last output of the LSTM layer 

        //Apply backprop on the last two images (images 3/4 and 4/4)
        network.back_propagation(outputs[j]);
        network.back_propagation(outputs[j]);

        network.update_weights();
    }
}`

`

josephjaspers commented 4 years ago

What may be easier is just going to LSTM.h

    template<class X>
    auto forward_propagation(const X& x) {
        if (cs.empty()) {
            cs.push_back(c);
        }

        t_minus_index = 0;

        fs.push_back( f_g(wf * x + bf) );
        is.push_back( i_g(wi * x + bi) );
        zs.push_back( z_g(wz * x + bz) );
        os.push_back( o_g(wo * x + bo) );

        auto& f = fs.back();
        auto& z = zs.back();
        auto& i = is.back();
        auto& o = os.back();

        c %= f;     //%= element-wise assign-multiplication
        c += z % i; //%  element-wise multiplication
        cs.push_back(c);

                (c_g(c) % o).print(); // Add this to print the output 
        return c_g(c) % o;
    }

And from here you can print all of the intermediate values.

I will try to add better accessors to make debug easier. I am currently working on the save/load feature.

xinsuinizhuan commented 4 years ago

OK!Let me to debug the nans problem!You can focus on the save/load feature.

xinsuinizhuan commented 4 years ago

I find the problem.When i debug it, the bf has -nans value, in function forward_propagation, in lstm.h file. So i set init them in lstm construction function: bf.randomize(-1, 1); bz.randomize(-.1, .1); bi.randomize(-.1, .1); bo.randomize(0, .5); then, it works, but the accuracy rate is low, i don't know, whether it is right that i add. this is snapshot, bf has -nans: bf1 bf I add the init: init the accuracy rate is low: right

xinsuinizhuan commented 4 years ago

others it can spport to print the MAPE (and MAE, MSE, RMSE, NRMSE) every epoch?

josephjaspers commented 4 years ago

I should have randomized the biases! Thank you so much that was a bug in my code!

Yes I will add a ticket for those other features as well.

Update: Added the randomization to biases here: 4ea0d47 "LSTM add biases randomization, special thanks to xinsuinizhuan for debugging!"

josephjaspers commented 4 years ago

I find the problem.When i debug it, the bf has -nans value, in function forward_propagation, in lstm.h file. So i set init them in lstm construction function: bf.randomize(-1, 1); bz.randomize(-.1, .1); bi.randomize(-.1, .1); bo.randomize(0, .5); then, it works, but the accuracy rate is low, i don't know, whether it is right that i add.

Yes, the learning rate for the lstm is very low. Increasing the number of epochs or increasing the learning rate would improve performance. I keep it at a low level to prevent it from becoming nans. The low accuracy is expected.

And it was correct that you added them in the init!

xinsuinizhuan commented 4 years ago

Thank you, you are so great! It works,i set epoch to 100, and the performance is better. But, i want to get a better performance, and i set epoch to 1000, then it appears -nans, i guest it is overfitting, i think it should be limited, to set a arrived accuracy, when it achieve epochs or achieve the accuracy,stop the train, or and the dropout to prevent overfitting.

xinsuinizhuan commented 4 years ago

My goal is to use lstm to forecast the Electric-Power-Short-Term-Load,i have the every 15 minutes Electric-Power-Load value every day, so i want use the history data to forecast。I have two model to forecast,use the history data to forecast the next day the every 15 minutes value (cnn+lstm). I have test two methods, first is : input is one dimensionality(only date), output is 96 dimensionality(every 15 minutes of a day), data like the uploader: ACCRTU0001001_15min.zip second is: input is 24 dimensionality(befor the 24 days data, 24""96), output is one dimensionality(the next day data, 1""96), data like the uploader: Germany_BZNDE_LU_20181001-20190722.zip But the two method' s accuracy rate and MAPE is low, as the forecast like https://github.com/YuanSiping/Short-Term-Load-Forecasting-for-Electric-Power-Systems, it use the before 24 minutes to forecast the next minutes value, the mape is so greate. Could you give me some suggests? What i should do? How to use your lstm modle to forcast?

josephjaspers commented 4 years ago

Thank you, you are so great! It works,i set epoch to 100, and the performance is better. But, i want to get a better performance, and i set epoch to 1000, then it appears -nans, i guest it is overfitting, i think it should be limited, to set a arrived accuracy, when it achieve epochs or achieve the accuracy,stop the train, or and the dropout to prevent overfitting.

I just added 'save' and 'load' so you could possible save every few epochs and than choose whichever one had the best performance.

I am trying to improve save and load and than I will be working on adding logging for MAPE (and MAE, MSE, RMSE, NRMSE)

------------------_ I would start by normalizing the data (putting it into range of [0,1]), and trying a simple LSTM/RNN model. LSTM -> FeedForward -> Logistic

You could attempt to predict how much the voltage changes opposed to its actual value. (And train on the change of the value opposed to the actual value). That way the input data/output data is always relative (and in a smaller range).

I will have to look at the data more to give a better answer though.

I see the data is in CSV format. You can load csv's with:

#include "BlackCat_IO.h"
int main() {
     BC::Matrix<double> csv =
             BC::io::read_uniform<double>(BC::io::csv_descriptor("data.csv").header(true));
}
xinsuinizhuan commented 4 years ago

Thank you,very much! First,I will create a server to train different company's electricity power load, and to forecast them by themself's trained saved modle automaticly. The different company, the data is different, so it is diffcult to choose the best performance's epochs one by one, then we should think about how to prevent the overfitting, to let it stop the train automatically, when it achieve epochs or achieve the accuracy . Second, yes, my date is the voltage changes, is the actual electricity consumption。I will read the csv file, and normalize the data to [0,1], which is better to use max-min or tanh to normalize them?

josephjaspers commented 4 years ago

I believe (x - min) / (max - min) would be better than tanh. (As large values in tanh would all result in the same value).

xinsuinizhuan commented 4 years ago

About how to prevent overfitting, you can refer to this:https://github.com/Artelnics/OpenNN.

xinsuinizhuan commented 4 years ago

I set the epoch to 200, it apear:[-nan(ind, -nan(ind, -nan(ind, -nan(ind, -nan(ind, -nan(ind, -nan(ind, -nan(ind, -nan(ind, -nan(ind]. It seems not good, how to prevent the overfitting.Because different data, the best epoch is different, to choose it artificial is too difficult. 1565833416(1) shoul be to deal with.

josephjaspers commented 4 years ago

I ran the test with 200 epochs and I didn't have any nans.

However you can simply save the neural network every few epochs so that you can load back the most accurate epoch.

I am still working on adding appropriate error logging.

    for (int i = 0; i < epochs; ++i){
        std::cout << " current epoch: " << i << std::endl;
        for (int j = 0; j < samples/batch_size; j++) {

            //Feed 1/4th of the image at a time
            network.forward_propagation(chunk(inputs[j], BC::index(0,        0), BC::shape(784/4, batch_size)));
            network.forward_propagation(chunk(inputs[j], BC::index(784* 1/4, 0), BC::shape(784/4, batch_size)));
            network.forward_propagation(chunk(inputs[j], BC::index(784* 2/4, 0), BC::shape(784/4, batch_size)));
            network.forward_propagation(chunk(inputs[j], BC::index(784* 3/4, 0), BC::shape(784/4, batch_size)));

            //Apply backprop on the last two images (images 3/4 and 4/4)
            network.back_propagation(outputs[j]);
            network.back_propagation(outputs[j]);

            network.update_weights();
        }
         if (i%10 == 0) {
               network.save("mnist_recurrent_epoch"  + std::to_string(i)); 
          }

    }

To load the saved weights...

network.load("mnist_recurrent_epoch" + std::to_string(best_epoch)); 
josephjaspers commented 4 years ago

Closing as issue has been resolved.