joaopauloschuler / neural-api

CAI NEURAL API - Pascal based deep learning neural network API optimized for AVX, AVX2 and AVX512 instruction sets plus OpenCL capable devices including AMD, Intel and NVIDIA.
GNU Lesser General Public License v2.1
371 stars 199 forks source link

Strange output of TNNetFullConnectReLU #35

Closed HuguesDug closed 3 years ago

HuguesDug commented 3 years ago

Hello,

I have made a basic network, for testing and debugging.

All inputs in the training set are always 0. All expected output are always the same (3 outputs, 0.1 / 0.25 / 0.50).

So, I would expect the netwrok to learn the bias rapidelly. It does not. The output is always 0.

If now I set the inputs to a random number, the network will learn.

Network is rather simple, although I used the "multi input" option.

// Create Network NN := TNNet.Create();

// Create input layers structure for i := 0 to length(InputLayers) - 1 do begin InputLayers[i] := NN.AddLayer(TNNetInput.Create(NbQuotes, NbQuotesData, 1)); end;

for i := 0 to Length(Branch) - 1 do Branch[i] := NN.AddLayerAfter(TNNetFullConnectReLU.Create(3), InputLayers[i]);

// Merge branches NN.AddLayer(TNNetConcat.Create(Branch));

// Output layer NN.AddLayer(TNNetFullConnectReLU.Create(3)); NN.AddLayer(TNNetFullConnectReLU.Create(3));

// Init weights NN.InitWeights;

// Set learning Rate NN.SetLearningRate(0.01, 0.95); NN.ErrorProc := MyErrorProc;

joaopauloschuler commented 3 years ago

Thank you so much for the detailed bug report.

I'll have a look and reply.

May I ask you please if you are working with the latest version of the source code?

HuguesDug commented 3 years ago

Hello,

You are always so quick in replying. I really appreciate what you do to bring Pascal (delphi/Lazarus) community a decent API for neural networks.

To answer your question, yes, I do use the latest version. My environment is a Delphi 10.3 community.

As you can see from the stucture of the network, each banch has a Relu fully connected layer with Bias. So, during learning process, the bias should reach a value so that the next two layers after concact should rapidelly get some proper weights reaching the expected "constant" values of the outputs.

With inputs "always 0", outputs will be always 0. With inputs being "random number," outputs will be "nearly OK". Tipically 2 of them are OK, the last one stays at 0, not always the same. With inputs being "1/Epoch", the 3 outputs will fit to the target.

Strange isn't it ?

It looks like the bias is not calculated.

joaopauloschuler commented 3 years ago

Please let me know if this fix works.

joaopauloschuler commented 3 years ago

The error was: when input was zero, there was no derivative available to be used with gradient descent. This is why it was working with random inputs.

Thank you for reporting with plenty of details.

HuguesDug commented 3 years ago

Tested : work fine now !

Thanks for the fix.