network structure puzzle, have or not softmax layer?

xinsuinizhuan commented 4 years ago

1、have no softmax layer, trained, and print the output data and trained forcast data, but forcast data is so big ,not between 0 and 1(i have Maximum and minimum normalization the input data): My net is: auto make_lstm_network() { return BC::nn::neuralnetwork( BC::nn::lstm(BC::host_tag(), 96 * 10, 256), BC::nn::recurrent(BC::host_tag(), 256, 192), BC::nn::output_layer(BC::host_tag(), 192) ); } using network_type = decltype(make_lstm_network());

typedef struct _LstmPredictTask { int fd; int m_inputs_number; int m_outputs_number; int m_sequence_length; int m_train_number; int m_batch_size; double m_learning_rate; network_type m_pnetwork = make_lstm_network(); void reset_neural_network() { m_pnetwork = std::move(make_lstm_network()); } } LstmPredictTask; I create the net and train: //start train LstmPredictTask* lstmpredicttask = new LstmPredictTask(); if (lstmpredicttask == NULL) { return -2; }

//LstmPredictTask lstmpredicttask;
std::cout << "Neural Network architecture: \n" << lstmpredicttask->m_pnetwork.get_string_architecture() << std::endl;
lstmpredicttask->m_pnetwork.set_learning_rate(lstmpredicttask->m_learning_rate);
lstmpredicttask->m_pnetwork.set_batch_size(lstmpredicttask->m_batch_size);

int training_sets;
std::pair<cube, cube> data = load_train_data(system_tag, datafilepath, lstmpredicttask, &training_sets);
cube& inputs = data.first;
cube& outputs = data.second;

std::cout <<" training..." << std::endl;
auto start = std::chrono::system_clock::now();

std::cout << "imagesoutput:------------------------------------" << std::endl;
auto imagesoutput = reshape(outputs[0], BC::shape(96, 2, lstmpredicttask->m_batch_size));
imagesoutput[0].t().print_sparse(5);

for (int i = 0; i < epochs; ++i) {
    std::cout << " current epoch: " << i << std::endl;
    for (int j = 0; j < training_sets; ++j) {
        lstmpredicttask->m_pnetwork.forward_propagation(inputs[j]);
        lstmpredicttask->m_pnetwork.back_propagation(outputs[j]);
        lstmpredicttask->m_pnetwork.update_weights();
    }
}

if (strlen(_trainparamsavefile) != 0)
{
    lstmpredicttask->m_pnetwork.save(_trainparamsavefile); //Uncomment to add saving/loading
}

auto end = std::chrono::system_clock::now();
clock total = clock(end - start);
std::cout << " training time: " << total.count() << std::endl;

auto batch = inputs[0];
mat hyps = lstmpredicttask->m_pnetwork.forward_propagation(batch);
std::cout << " MAPE loss: " << BC::Scalar<double>(BC::nn::MAPE(hyps, outputs[0]) / lstmpredicttask->m_batch_size).data()[0] << std::endl;

std::cout << "process_train_data------------------------------------" << std::endl;
for (int i = 0; i < 1; ++i) {
    hyps[i].print();
    std::cout << "------------------------------------" << std::endl;
}

I print the output data and forcast data, but forcast data is not right, not between 0 and 1, and the forcast data is so big? Neural Network architecture: LSTM: inputs: 960 outputs: 256 Recurrent: inputs: 256 outputs: 192 Output_Layer: inputs: 192 outputs: 192

training... imagesoutput:------------------------------------ [[0.380, 0.750, 0.570, 0.603, 0.504, 0.529, 0.585, 0.654, 0.497, 0.674, 0.708, 0.564, 0.841, 0.592, 0.590, 0.724, 0.585, 0.884, 0.273, 1.000, 0.681, 0.496, 0.857, 0.774, 0.749, 0.721, 0.659, 0.582, 0.838, 0.701, 0.617, 0.832, 0.718, 0.699, 0.591, 0.624, 0.737, 0.571, 0.632, 0.707, 0.773, 0.512, 0.823, 0.595, 0.534, 0.784, 0.581, 0.701, 0.698, 0.517, 0.690, 0.826, 0.540, 0.788, 0.709, 0.793, 0.700, 0.722, 0.690, 0.810, 0.693, 0.801, 0.693, 0.801, 0.644, 0.572, 0.552, 0.642, 0.696, 0.593, 0.599, 0.642, 0.584, 0.765, 0.619, 0.514, 0.490, 0.647, 0.385, 0.496, 0.643, 0.552, 0.503, 0.610, 0.567, 0.651, 0.636, 0.514, 0.687, 0.675, 0.704, 0.795, 0.722, 0.753, 0.531, 0.805] [0.333, 0.846, 0.499, 0.595, 0.510, 0.496, 0.244, 1.000, 0.491, 0.682, 0.708, 0.564, 0.823, 0.592, 0.590, 0.724, 0.570, 0.874, 0.601, 0.679, 0.689, 0.515, 0.761, 0.877, 0.673, 0.712, 0.251, 1.000, 0.857, 0.675, 0.632, 0.814, 0.737, 0.699, 0.606, 0.638, 0.746, 0.659, 0.568, 0.806, 0.686, 0.606, 0.730, 0.696, 0.547, 0.775, 0.604, 0.717, 0.698, 0.523, 0.718, 0.748, 0.635, 0.798, 0.718, 0.812, 0.709, 0.663, 0.821, 0.838, 0.720, 0.735, 0.822, 0.735, 0.830, 0.629, 0.712, 0.642, 0.883, 0.578, 0.773, 0.611, 0.754, 0.728, 0.790, 0.501, 0.623, 0.631, 0.484, 0.484, 0.788, 0.483, 0.567, 0.541, 0.647, 0.525, 0.661, 0.413, 0.723, 0.583, 0.786, 0.640, 0.722, 0.606, 0.537, 0.640]] current epoch: 0 ... current epoch: 127 training time: 4.44725 MAPE loss: 0.0310454 process_train_data------------------------------------ [-25.8460, -8.38295, 14.65372, -5.71706, -27.4717, 8.602991, -23.6860, -35.2958, 17.08113, -15.3109, -4.69952, 16.89871, 35.82342, 21.46421, 7.413482, 35.08593, 6.854354, 12.87520, -32.6889, 13.24751, 21.69270, 9.105426, -7.89662, -14.2385, -28.2898, 6.437375, -5.50345, -9.93736, -23.0111, 2.646907, 17.38767, -22.7771, 1.653123, -4.01012, 19.70499, -15.3654, -23.4237, 12.21545, 26.81120, 4.972610, -2.22349, 40.82117, -39.3307, -16.4180, -3.45213, 3.258177, 14.86659, 4.521998, 17.44581, -9.25711, 8.646381, -3.66365, -21.4957, 2.988301, 11.47178, -3.76695, 2.691704, -10.3771, 20.55152, -20.8875, -14.0821, 10.21881, -11.5416, -2.01150, -14.9797, -22.1648, 12.36946, 16.69329, -11.0484, 9.839924, -25.3255, -16.6491, 21.82862, 0.515754, 26.25349, -31.2395, -11.0210, 36.62269, 1.771704, -25.2875, -15.7974, 16.03234, 4.626742, -34.5083, -2.80271, 18.01022, 0.685722, 12.62443, -1.93305, 54.40408, -26.3419, 10.30347, 14.39106, -22.9777, 4.736426, -5.70500, -1.08147, -45.7611, 22.09196, 11.78476, 5.660966, 12.14032, -38.3728, 28.37263, 0.546137, 9.886680, -1.93493, -32.3411, 17.84676, -21.0064, -2.20607, -14.6393, -19.7328, -2.09870, -7.53886, 8.389552, -10.0535, -18.1724, -2.34214, -4.51320, -9.92966, -4.03211, 15.46169, -6.36365, 13.37198, -12.9277, 0.622534, 32.86275, -7.55993, 4.308530, 23.71868, -14.3317, -9.16887, 5.663595, -0.60777, -11.6949, -31.5343, 18.74082, -17.3257, 2.044618, 7.356360, -16.0196, 0.659284, 16.72092, 15.97791, -4.75050, -4.47696, -9.21328, 11.09348, 4.904877, -32.0915, -6.60098, -29.8067, 5.970200, 5.943465, 49.47677, 2.389998, -1.62050, 26.01327, 18.25720, -18.1648, 21.47414, -16.8297, 18.06407, 0.427003, -18.3307, 2.537354, -36.9672, 14.45144, 12.91946, -18.1769, 2.703407, -4.11329, -2.75824, -14.2142, 43.97624, 22.92934, 1.787207, -4.71201, 3.859387, -18.1937, -2.71971, 14.80760, 52.58834, -4.54549, 29.55029, -7.89654, 13.35169, -10.4480, -23.7541, 14.79737, -6.61407]

2、have softmax layer,trained, and print the output data and trained forcast data, but forcast data fell some wrong with it ,have more zero and so small: My net is: auto make_lstm_network() { return BC::nn::neuralnetwork( BC::nn::lstm(BC::host_tag(), 96 10, 256), BC::nn::recurrent(BC::host_tag(), 256, 192), BC::nn::softmax(BC::host_tag(), 192), BC::nn::output_layer(BC::host_tag(), 192) ); } using network_type = decltype(make_lstm_network()); typedef struct _LstmPredictTask { int fd; int m_inputs_number; int m_outputs_number; int m_sequence_length; int m_train_number; int m_batch_size; double m_learning_rate; network_type m_pnetwork = make_lstm_network(); void reset_neural_network() { m_pnetwork = std::move(make_lstm_network()); } } LstmPredictTask; I create the net and train: //start train LstmPredictTask lstmpredicttask = new LstmPredictTask(); if (lstmpredicttask == NULL) { return -2; }

//LstmPredictTask lstmpredicttask;
std::cout << "Neural Network architecture: \n" << lstmpredicttask->m_pnetwork.get_string_architecture() << std::endl;
lstmpredicttask->m_pnetwork.set_learning_rate(lstmpredicttask->m_learning_rate);
lstmpredicttask->m_pnetwork.set_batch_size(lstmpredicttask->m_batch_size);

int training_sets;
std::pair<cube, cube> data = load_train_data(system_tag, datafilepath, lstmpredicttask, &training_sets);
cube& inputs = data.first;
cube& outputs = data.second;

std::cout <<" training..." << std::endl;
auto start = std::chrono::system_clock::now();

std::cout << "imagesoutput:------------------------------------" << std::endl;
auto imagesoutput = reshape(outputs[0], BC::shape(96, 2, lstmpredicttask->m_batch_size));
imagesoutput[0].t().print_sparse(5);

for (int i = 0; i < epochs; ++i) {
    std::cout << " current epoch: " << i << std::endl;
    for (int j = 0; j < training_sets; ++j) {
        lstmpredicttask->m_pnetwork.forward_propagation(inputs[j]);
        lstmpredicttask->m_pnetwork.back_propagation(outputs[j]);
        lstmpredicttask->m_pnetwork.update_weights();
    }
}

if (strlen(_trainparamsavefile) != 0)
{
    lstmpredicttask->m_pnetwork.save(_trainparamsavefile); //Uncomment to add saving/loading
}

auto end = std::chrono::system_clock::now();
clock total = clock(end - start);
std::cout << " training time: " << total.count() << std::endl;

auto batch = inputs[0];
mat hyps = lstmpredicttask->m_pnetwork.forward_propagation(batch);
std::cout << " MAPE loss: " << BC::Scalar<double>(BC::nn::MAPE(hyps, outputs[0]) / lstmpredicttask->m_batch_size).data()[0] << std::endl;

std::cout << "process_train_data------------------------------------" << std::endl;
for (int i = 0; i < 1; ++i) {
    hyps[i].print();
    std::cout << "------------------------------------" << std::endl;
}

I print the output data and forcast data, but forcast data fell some wrong with it, have more zero and so small? Neural Network architecture: LSTM: inputs: 960 outputs: 256 Recurrent: inputs: 256 outputs: 192 SoftMax: inputs: 192 outputs: 192 Output_Layer: inputs: 192 outputs: 192

training... imagesoutput:------------------------------------ [[0.380, 0.750, 0.570, 0.603, 0.504, 0.529, 0.585, 0.654, 0.497, 0.674, 0.708, 0.564, 0.841, 0.592, 0.590, 0.724, 0.585, 0.884, 0.273, 1.000, 0.681, 0.496, 0.857, 0.774, 0.749, 0.721, 0.659, 0.582, 0.838, 0.701, 0.617, 0.832, 0.718, 0.699, 0.591, 0.624, 0.737, 0.571, 0.632, 0.707, 0.773, 0.512, 0.823, 0.595, 0.534, 0.784, 0.581, 0.701, 0.698, 0.517, 0.690, 0.826, 0.540, 0.788, 0.709, 0.793, 0.700, 0.722, 0.690, 0.810, 0.693, 0.801, 0.693, 0.801, 0.644, 0.572, 0.552, 0.642, 0.696, 0.593, 0.599, 0.642, 0.584, 0.765, 0.619, 0.514, 0.490, 0.647, 0.385, 0.496, 0.643, 0.552, 0.503, 0.610, 0.567, 0.651, 0.636, 0.514, 0.687, 0.675, 0.704, 0.795, 0.722, 0.753, 0.531, 0.805] [0.333, 0.846, 0.499, 0.595, 0.510, 0.496, 0.244, 1.000, 0.491, 0.682, 0.708, 0.564, 0.823, 0.592, 0.590, 0.724, 0.570, 0.874, 0.601, 0.679, 0.689, 0.515, 0.761, 0.877, 0.673, 0.712, 0.251, 1.000, 0.857, 0.675, 0.632, 0.814, 0.737, 0.699, 0.606, 0.638, 0.746, 0.659, 0.568, 0.806, 0.686, 0.606, 0.730, 0.696, 0.547, 0.775, 0.604, 0.717, 0.698, 0.523, 0.718, 0.748, 0.635, 0.798, 0.718, 0.812, 0.709, 0.663, 0.821, 0.838, 0.720, 0.735, 0.822, 0.735, 0.830, 0.629, 0.712, 0.642, 0.883, 0.578, 0.773, 0.611, 0.754, 0.728, 0.790, 0.501, 0.623, 0.631, 0.484, 0.484, 0.788, 0.483, 0.567, 0.541, 0.647, 0.525, 0.661, 0.413, 0.723, 0.583, 0.786, 0.640, 0.722, 0.606, 0.537, 0.640]] current epoch: 0 ... current epoch: 127 training time: 3.44396 MAPE loss: 17.5378 process_train_data------------------------------------ [0.000000, 0.015604, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.042159, 0.000000, 0.000000, 0.000000, 0.000000, 0.077015, 0.000000, 0.000000, 0.002480, 0.000000, 0.105974, 0.000000, 0.059280, 0.000000, 0.000000, 0.063471, 0.075822, 0.000055, 0.000000, 0.000000, 0.000025, 0.063648, 0.000000, 0.000000, 0.072730, 0.000195, 0.000000, 0.000000, 0.000000, 0.010841, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.024128, 0.000000, 0.000000, 0.000000, 0.000000, 0.000073, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000020, 0.000000, 0.000000, 0.020468, 0.079445, 0.000000, 0.014061, 0.000001, 0.000004, 0.000165, 0.000000, 0.000000, 0.000000, 0.013662, 0.000000, 0.000000, 0.000000, 0.000000, 0.000001, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000005, 0.000001, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000231, 0.000000, 0.000000, 0.000000, 0.000000, 0.005832, 0.000000, 0.000000, 0.000000, 0.000000, 0.000227, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.088577, 0.000000, 0.000000, 0.000566, 0.000000, 0.000000, 0.000000, 0.000000, 0.036642, 0.000000, 0.000000, 0.000104, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.003014, 0.000000, 0.000000, 0.000000, 0.000000, 0.012931, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.005828, 0.000001, 0.042263, 0.000000, 0.000000, 0.000000, 0.062403, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000014, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000034]

some wront with my network structure? Could you give me some suggestions? Or how should i do?

josephjaspers commented 4 years ago

You should most likely have a logistic or tanh as your last layer. Make sure you data is normalized to be in the range of 0 to 1.

xinsuinizhuan commented 4 years ago

You should most likely have a logistic or tanh as your last layer. Make sure you data is normalized to be in the range of 0 to 1.

No I fell some wront with it. 1、I set the tanh as my last layer, the traind data is: process_train_data------------------------------------ [1.000000, -1.00000, 1.000000, -1.00000, -1.00000, 0.999983, 1.000000, 0.999998, 1.000000, 0.607386, 1.000000, 1.000000, 0.934946, 1.000000, 0.999978, 0.999966, 0.999996, 1.000000, 0.748701, 0.931370, 1.000000, 1.000000, 1.000000, -1.00000, 1.000000, 1.000000, 1.000000, 1.000000, 1.000000, 1.000000, 0.998263, 1.000000, 1.000000, 0.999764, 1.000000, 0.999989, 1.000000, 1.000000, 1.000000, 1.000000, 0.999995, 1.000000, 0.999998, 0.999998, 1.000000, 1.000000, 1.000000, 1.000000, 1.000000, -1.00000, 1.000000, 0.999975, 0.999994, 1.000000, 1.000000, 1.000000, 1.000000, -1.00000, 0.999999, 1.000000, 0.999966, 1.000000, 1.000000, 1.000000, 0.999944, 1.000000, 0.999906, 1.000000, 0.993191, 1.000000, 1.000000, 1.000000, -0.99999, 1.000000, 0.999994, 1.000000, -1.00000, 1.000000, 1.000000, 1.000000, 0.857669, 1.000000, 1.000000, 1.000000, 1.000000, 1.000000, 1.000000, 0.713852, 1.000000, 0.999985, 1.000000, -1.00000, 0.951905, 1.000000, 1.000000, -1.00000] It went to two extremes, should not all close to 1, and have -1.It should be evenly distributed between 0 and 1.like the(below is the predict data , and up is the real output data, But they are too far apart): 2、I set the logisticas my last layer, the traind data is: all close to 1。 My original data is: ACCRTU0740001_data.txt I use the 2 to 97 column(not 1 column), normalized each column(use every column's max and min value), and use the pre 10 row datas to predict next two row data.

xinsuinizhuan commented 4 years ago

I use this net struct, the trained forcast print seems normal: auto make_lstm_network() { return BC::nn::neuralnetwork( BC::nn::lstm(BC::host_tag(), 96 * 10, 512), BC::nn::lstm(BC::host_tag(), 512, 216), BC::nn::feedforward(BC::host_tag(), 216, 96), BC::nn::logistic(BC::host_tag(), 96), BC::nn::logging_output_layer(BC::host_tag(), 96, BC::nn::RMSE).skip_every(100) ); } using network_type = decltype(make_lstm_network()); The traind forcast print: whether the lstm and recurrent layer can't match?

josephjaspers commented 4 years ago

Just reading the last post now. Is this still an issue or was this related to https://github.com/josephjaspers/blackcat_tensors/issues/40 ?

xinsuinizhuan commented 4 years ago

Just reading the last post now. Is this still an issue or was this related to #40 ?

No, close it. i have solved it.

josephjaspers / blackcat_tensors

network structure puzzle, have or not softmax layer? #43