josephjaspers / blackcat_tensors

Matrix-Vector Library Designed for Neural Network Construction. cuda (gpu) support, openmp (multithreaded cpu) support, partial support of BLAS, expression template based implementation PTX code generation identical to hand written kernels, and support for auto-differentiation
12 stars 4 forks source link

network structure puzzle, have or not softmax layer? #43

Closed xinsuinizhuan closed 4 years ago

xinsuinizhuan commented 4 years ago

1、have no softmax layer, trained, and print the output data and trained forcast data, but forcast data is so big ,not between 0 and 1(i have Maximum and minimum normalization the input data): My net is: auto make_lstm_network() { return BC::nn::neuralnetwork( BC::nn::lstm(BC::host_tag(), 96 * 10, 256), BC::nn::recurrent(BC::host_tag(), 256, 192), BC::nn::output_layer(BC::host_tag(), 192) ); } using network_type = decltype(make_lstm_network());

typedef struct _LstmPredictTask { int fd; int m_inputs_number; int m_outputs_number; int m_sequence_length; int m_train_number; int m_batch_size; double m_learning_rate; network_type m_pnetwork = make_lstm_network(); void reset_neural_network() { m_pnetwork = std::move(make_lstm_network()); } } LstmPredictTask; I create the net and train: //start train LstmPredictTask* lstmpredicttask = new LstmPredictTask(); if (lstmpredicttask == NULL) { return -2; }

//LstmPredictTask lstmpredicttask;
std::cout << "Neural Network architecture: \n" << lstmpredicttask->m_pnetwork.get_string_architecture() << std::endl;
lstmpredicttask->m_pnetwork.set_learning_rate(lstmpredicttask->m_learning_rate);
lstmpredicttask->m_pnetwork.set_batch_size(lstmpredicttask->m_batch_size);

int training_sets;
std::pair<cube, cube> data = load_train_data(system_tag, datafilepath, lstmpredicttask, &training_sets);
cube& inputs = data.first;
cube& outputs = data.second;

std::cout <<" training..." << std::endl;
auto start = std::chrono::system_clock::now();

std::cout << "imagesoutput:------------------------------------" << std::endl;
auto imagesoutput = reshape(outputs[0], BC::shape(96, 2, lstmpredicttask->m_batch_size));
imagesoutput[0].t().print_sparse(5);

for (int i = 0; i < epochs; ++i) {
    std::cout << " current epoch: " << i << std::endl;
    for (int j = 0; j < training_sets; ++j) {
        lstmpredicttask->m_pnetwork.forward_propagation(inputs[j]);
        lstmpredicttask->m_pnetwork.back_propagation(outputs[j]);
        lstmpredicttask->m_pnetwork.update_weights();
    }
}

if (strlen(_trainparamsavefile) != 0)
{
    lstmpredicttask->m_pnetwork.save(_trainparamsavefile); //Uncomment to add saving/loading
}

auto end = std::chrono::system_clock::now();
clock total = clock(end - start);
std::cout << " training time: " << total.count() << std::endl;

auto batch = inputs[0];
mat hyps = lstmpredicttask->m_pnetwork.forward_propagation(batch);
std::cout << " MAPE loss: " << BC::Scalar<double>(BC::nn::MAPE(hyps, outputs[0]) / lstmpredicttask->m_batch_size).data()[0] << std::endl;

std::cout << "process_train_data------------------------------------" << std::endl;
for (int i = 0; i < 1; ++i) {
    hyps[i].print();
    std::cout << "------------------------------------" << std::endl;
}

I print the output data and forcast data, but forcast data is not right, not between 0 and 1, and the forcast data is so big? Neural Network architecture: LSTM: inputs: 960 outputs: 256 Recurrent: inputs: 256 outputs: 192 Output_Layer: inputs: 192 outputs: 192

training... imagesoutput:------------------------------------ [[0.380, 0.750, 0.570, 0.603, 0.504, 0.529, 0.585, 0.654, 0.497, 0.674, 0.708, 0.564, 0.841, 0.592, 0.590, 0.724, 0.585, 0.884, 0.273, 1.000, 0.681, 0.496, 0.857, 0.774, 0.749, 0.721, 0.659, 0.582, 0.838, 0.701, 0.617, 0.832, 0.718, 0.699, 0.591, 0.624, 0.737, 0.571, 0.632, 0.707, 0.773, 0.512, 0.823, 0.595, 0.534, 0.784, 0.581, 0.701, 0.698, 0.517, 0.690, 0.826, 0.540, 0.788, 0.709, 0.793, 0.700, 0.722, 0.690, 0.810, 0.693, 0.801, 0.693, 0.801, 0.644, 0.572, 0.552, 0.642, 0.696, 0.593, 0.599, 0.642, 0.584, 0.765, 0.619, 0.514, 0.490, 0.647, 0.385, 0.496, 0.643, 0.552, 0.503, 0.610, 0.567, 0.651, 0.636, 0.514, 0.687, 0.675, 0.704, 0.795, 0.722, 0.753, 0.531, 0.805] [0.333, 0.846, 0.499, 0.595, 0.510, 0.496, 0.244, 1.000, 0.491, 0.682, 0.708, 0.564, 0.823, 0.592, 0.590, 0.724, 0.570, 0.874, 0.601, 0.679, 0.689, 0.515, 0.761, 0.877, 0.673, 0.712, 0.251, 1.000, 0.857, 0.675, 0.632, 0.814, 0.737, 0.699, 0.606, 0.638, 0.746, 0.659, 0.568, 0.806, 0.686, 0.606, 0.730, 0.696, 0.547, 0.775, 0.604, 0.717, 0.698, 0.523, 0.718, 0.748, 0.635, 0.798, 0.718, 0.812, 0.709, 0.663, 0.821, 0.838, 0.720, 0.735, 0.822, 0.735, 0.830, 0.629, 0.712, 0.642, 0.883, 0.578, 0.773, 0.611, 0.754, 0.728, 0.790, 0.501, 0.623, 0.631, 0.484, 0.484, 0.788, 0.483, 0.567, 0.541, 0.647, 0.525, 0.661, 0.413, 0.723, 0.583, 0.786, 0.640, 0.722, 0.606, 0.537, 0.640]] current epoch: 0 ... current epoch: 127 training time: 4.44725 MAPE loss: 0.0310454 process_train_data------------------------------------ [-25.8460, -8.38295, 14.65372, -5.71706, -27.4717, 8.602991, -23.6860, -35.2958, 17.08113, -15.3109, -4.69952, 16.89871, 35.82342, 21.46421, 7.413482, 35.08593, 6.854354, 12.87520, -32.6889, 13.24751, 21.69270, 9.105426, -7.89662, -14.2385, -28.2898, 6.437375, -5.50345, -9.93736, -23.0111, 2.646907, 17.38767, -22.7771, 1.653123, -4.01012, 19.70499, -15.3654, -23.4237, 12.21545, 26.81120, 4.972610, -2.22349, 40.82117, -39.3307, -16.4180, -3.45213, 3.258177, 14.86659, 4.521998, 17.44581, -9.25711, 8.646381, -3.66365, -21.4957, 2.988301, 11.47178, -3.76695, 2.691704, -10.3771, 20.55152, -20.8875, -14.0821, 10.21881, -11.5416, -2.01150, -14.9797, -22.1648, 12.36946, 16.69329, -11.0484, 9.839924, -25.3255, -16.6491, 21.82862, 0.515754, 26.25349, -31.2395, -11.0210, 36.62269, 1.771704, -25.2875, -15.7974, 16.03234, 4.626742, -34.5083, -2.80271, 18.01022, 0.685722, 12.62443, -1.93305, 54.40408, -26.3419, 10.30347, 14.39106, -22.9777, 4.736426, -5.70500, -1.08147, -45.7611, 22.09196, 11.78476, 5.660966, 12.14032, -38.3728, 28.37263, 0.546137, 9.886680, -1.93493, -32.3411, 17.84676, -21.0064, -2.20607, -14.6393, -19.7328, -2.09870, -7.53886, 8.389552, -10.0535, -18.1724, -2.34214, -4.51320, -9.92966, -4.03211, 15.46169, -6.36365, 13.37198, -12.9277, 0.622534, 32.86275, -7.55993, 4.308530, 23.71868, -14.3317, -9.16887, 5.663595, -0.60777, -11.6949, -31.5343, 18.74082, -17.3257, 2.044618, 7.356360, -16.0196, 0.659284, 16.72092, 15.97791, -4.75050, -4.47696, -9.21328, 11.09348, 4.904877, -32.0915, -6.60098, -29.8067, 5.970200, 5.943465, 49.47677, 2.389998, -1.62050, 26.01327, 18.25720, -18.1648, 21.47414, -16.8297, 18.06407, 0.427003, -18.3307, 2.537354, -36.9672, 14.45144, 12.91946, -18.1769, 2.703407, -4.11329, -2.75824, -14.2142, 43.97624, 22.92934, 1.787207, -4.71201, 3.859387, -18.1937, -2.71971, 14.80760, 52.58834, -4.54549, 29.55029, -7.89654, 13.35169, -10.4480, -23.7541, 14.79737, -6.61407]

2、have softmax layer,trained, and print the output data and trained forcast data, but forcast data fell some wrong with it ,have more zero and so small: My net is: auto make_lstm_network() { return BC::nn::neuralnetwork( BC::nn::lstm(BC::host_tag(), 96 10, 256), BC::nn::recurrent(BC::host_tag(), 256, 192), BC::nn::softmax(BC::host_tag(), 192), BC::nn::output_layer(BC::host_tag(), 192) ); } using network_type = decltype(make_lstm_network()); typedef struct _LstmPredictTask { int fd; int m_inputs_number; int m_outputs_number; int m_sequence_length; int m_train_number; int m_batch_size; double m_learning_rate; network_type m_pnetwork = make_lstm_network(); void reset_neural_network() { m_pnetwork = std::move(make_lstm_network()); } } LstmPredictTask; I create the net and train: //start train LstmPredictTask lstmpredicttask = new LstmPredictTask(); if (lstmpredicttask == NULL) { return -2; }

//LstmPredictTask lstmpredicttask;
std::cout << "Neural Network architecture: \n" << lstmpredicttask->m_pnetwork.get_string_architecture() << std::endl;
lstmpredicttask->m_pnetwork.set_learning_rate(lstmpredicttask->m_learning_rate);
lstmpredicttask->m_pnetwork.set_batch_size(lstmpredicttask->m_batch_size);

int training_sets;
std::pair<cube, cube> data = load_train_data(system_tag, datafilepath, lstmpredicttask, &training_sets);
cube& inputs = data.first;
cube& outputs = data.second;

std::cout <<" training..." << std::endl;
auto start = std::chrono::system_clock::now();

std::cout << "imagesoutput:------------------------------------" << std::endl;
auto imagesoutput = reshape(outputs[0], BC::shape(96, 2, lstmpredicttask->m_batch_size));
imagesoutput[0].t().print_sparse(5);

for (int i = 0; i < epochs; ++i) {
    std::cout << " current epoch: " << i << std::endl;
    for (int j = 0; j < training_sets; ++j) {
        lstmpredicttask->m_pnetwork.forward_propagation(inputs[j]);
        lstmpredicttask->m_pnetwork.back_propagation(outputs[j]);
        lstmpredicttask->m_pnetwork.update_weights();
    }
}

if (strlen(_trainparamsavefile) != 0)
{
    lstmpredicttask->m_pnetwork.save(_trainparamsavefile); //Uncomment to add saving/loading
}

auto end = std::chrono::system_clock::now();
clock total = clock(end - start);
std::cout << " training time: " << total.count() << std::endl;

auto batch = inputs[0];
mat hyps = lstmpredicttask->m_pnetwork.forward_propagation(batch);
std::cout << " MAPE loss: " << BC::Scalar<double>(BC::nn::MAPE(hyps, outputs[0]) / lstmpredicttask->m_batch_size).data()[0] << std::endl;

std::cout << "process_train_data------------------------------------" << std::endl;
for (int i = 0; i < 1; ++i) {
    hyps[i].print();
    std::cout << "------------------------------------" << std::endl;
}

I print the output data and forcast data, but forcast data fell some wrong with it, have more zero and so small? Neural Network architecture: LSTM: inputs: 960 outputs: 256 Recurrent: inputs: 256 outputs: 192 SoftMax: inputs: 192 outputs: 192 Output_Layer: inputs: 192 outputs: 192

training... imagesoutput:------------------------------------ [[0.380, 0.750, 0.570, 0.603, 0.504, 0.529, 0.585, 0.654, 0.497, 0.674, 0.708, 0.564, 0.841, 0.592, 0.590, 0.724, 0.585, 0.884, 0.273, 1.000, 0.681, 0.496, 0.857, 0.774, 0.749, 0.721, 0.659, 0.582, 0.838, 0.701, 0.617, 0.832, 0.718, 0.699, 0.591, 0.624, 0.737, 0.571, 0.632, 0.707, 0.773, 0.512, 0.823, 0.595, 0.534, 0.784, 0.581, 0.701, 0.698, 0.517, 0.690, 0.826, 0.540, 0.788, 0.709, 0.793, 0.700, 0.722, 0.690, 0.810, 0.693, 0.801, 0.693, 0.801, 0.644, 0.572, 0.552, 0.642, 0.696, 0.593, 0.599, 0.642, 0.584, 0.765, 0.619, 0.514, 0.490, 0.647, 0.385, 0.496, 0.643, 0.552, 0.503, 0.610, 0.567, 0.651, 0.636, 0.514, 0.687, 0.675, 0.704, 0.795, 0.722, 0.753, 0.531, 0.805] [0.333, 0.846, 0.499, 0.595, 0.510, 0.496, 0.244, 1.000, 0.491, 0.682, 0.708, 0.564, 0.823, 0.592, 0.590, 0.724, 0.570, 0.874, 0.601, 0.679, 0.689, 0.515, 0.761, 0.877, 0.673, 0.712, 0.251, 1.000, 0.857, 0.675, 0.632, 0.814, 0.737, 0.699, 0.606, 0.638, 0.746, 0.659, 0.568, 0.806, 0.686, 0.606, 0.730, 0.696, 0.547, 0.775, 0.604, 0.717, 0.698, 0.523, 0.718, 0.748, 0.635, 0.798, 0.718, 0.812, 0.709, 0.663, 0.821, 0.838, 0.720, 0.735, 0.822, 0.735, 0.830, 0.629, 0.712, 0.642, 0.883, 0.578, 0.773, 0.611, 0.754, 0.728, 0.790, 0.501, 0.623, 0.631, 0.484, 0.484, 0.788, 0.483, 0.567, 0.541, 0.647, 0.525, 0.661, 0.413, 0.723, 0.583, 0.786, 0.640, 0.722, 0.606, 0.537, 0.640]] current epoch: 0 ... current epoch: 127 training time: 3.44396 MAPE loss: 17.5378 process_train_data------------------------------------ [0.000000, 0.015604, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.042159, 0.000000, 0.000000, 0.000000, 0.000000, 0.077015, 0.000000, 0.000000, 0.002480, 0.000000, 0.105974, 0.000000, 0.059280, 0.000000, 0.000000, 0.063471, 0.075822, 0.000055, 0.000000, 0.000000, 0.000025, 0.063648, 0.000000, 0.000000, 0.072730, 0.000195, 0.000000, 0.000000, 0.000000, 0.010841, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.024128, 0.000000, 0.000000, 0.000000, 0.000000, 0.000073, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000020, 0.000000, 0.000000, 0.020468, 0.079445, 0.000000, 0.014061, 0.000001, 0.000004, 0.000165, 0.000000, 0.000000, 0.000000, 0.013662, 0.000000, 0.000000, 0.000000, 0.000000, 0.000001, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000005, 0.000001, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000231, 0.000000, 0.000000, 0.000000, 0.000000, 0.005832, 0.000000, 0.000000, 0.000000, 0.000000, 0.000227, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.088577, 0.000000, 0.000000, 0.000566, 0.000000, 0.000000, 0.000000, 0.000000, 0.036642, 0.000000, 0.000000, 0.000104, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.003014, 0.000000, 0.000000, 0.000000, 0.000000, 0.012931, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.005828, 0.000001, 0.042263, 0.000000, 0.000000, 0.000000, 0.062403, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000014, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000034]

some wront with my network structure? Could you give me some suggestions? Or how should i do?

josephjaspers commented 4 years ago

You should most likely have a logistic or tanh as your last layer. Make sure you data is normalized to be in the range of 0 to 1.

xinsuinizhuan commented 4 years ago

You should most likely have a logistic or tanh as your last layer. Make sure you data is normalized to be in the range of 0 to 1.

No I fell some wront with it. 1、I set the tanh as my last layer, the traind data is: process_train_data------------------------------------ [1.000000, -1.00000, 1.000000, -1.00000, -1.00000, 0.999983, 1.000000, 0.999998, 1.000000, 0.607386, 1.000000, 1.000000, 0.934946, 1.000000, 0.999978, 0.999966, 0.999996, 1.000000, 0.748701, 0.931370, 1.000000, 1.000000, 1.000000, -1.00000, 1.000000, 1.000000, 1.000000, 1.000000, 1.000000, 1.000000, 0.998263, 1.000000, 1.000000, 0.999764, 1.000000, 0.999989, 1.000000, 1.000000, 1.000000, 1.000000, 0.999995, 1.000000, 0.999998, 0.999998, 1.000000, 1.000000, 1.000000, 1.000000, 1.000000, -1.00000, 1.000000, 0.999975, 0.999994, 1.000000, 1.000000, 1.000000, 1.000000, -1.00000, 0.999999, 1.000000, 0.999966, 1.000000, 1.000000, 1.000000, 0.999944, 1.000000, 0.999906, 1.000000, 0.993191, 1.000000, 1.000000, 1.000000, -0.99999, 1.000000, 0.999994, 1.000000, -1.00000, 1.000000, 1.000000, 1.000000, 0.857669, 1.000000, 1.000000, 1.000000, 1.000000, 1.000000, 1.000000, 0.713852, 1.000000, 0.999985, 1.000000, -1.00000, 0.951905, 1.000000, 1.000000, -1.00000] It went to two extremes, should not all close to 1, and have -1.It should be evenly distributed between 0 and 1.like the(below is the predict data , and up is the real output data, But they are too far apart): 图片 2、I set the logisticas my last layer, the traind data is: 图片 all close to 1。 My original data is: ACCRTU0740001_data.txt I use the 2 to 97 column(not 1 column), normalized each column(use every column's max and min value), and use the pre 10 row datas to predict next two row data.

xinsuinizhuan commented 4 years ago

I use this net struct, the trained forcast print seems normal: auto make_lstm_network() { return BC::nn::neuralnetwork( BC::nn::lstm(BC::host_tag(), 96 * 10, 512), BC::nn::lstm(BC::host_tag(), 512, 216), BC::nn::feedforward(BC::host_tag(), 216, 96), BC::nn::logistic(BC::host_tag(), 96), BC::nn::logging_output_layer(BC::host_tag(), 96, BC::nn::RMSE).skip_every(100) ); } using network_type = decltype(make_lstm_network()); The traind forcast print: 图片 whether the lstm and recurrent layer can't match?

josephjaspers commented 4 years ago

Just reading the last post now. Is this still an issue or was this related to https://github.com/josephjaspers/blackcat_tensors/issues/40 ?

xinsuinizhuan commented 4 years ago

Just reading the last post now. Is this still an issue or was this related to #40 ?

No, close it. i have solved it.