joaopauloschuler / neural-api

CAI NEURAL API - Pascal based deep learning neural network API optimized for AVX, AVX2 and AVX512 instruction sets plus OpenCL capable devices including AMD, Intel and NVIDIA.
GNU Lesser General Public License v2.1
356 stars 195 forks source link

Randomize #78

Closed Kryuski closed 2 years ago

Kryuski commented 2 years ago

I'm trying to figure out the basic examples (XorAndOr, Hypotenuse) using Delphi 10.4 Starter Edition. I noticed that when initializing the neural network, the program does not use the System.Randomize function to initialize the random number generator. Is this done intentionally, for testing purposes?

joaopauloschuler commented 2 years ago

@Kryuski , this is a good question. As I maintain 4 instructions sets (native, AVX, AVX2 and AVX512) plus OpenCL, I tend to avoid randomization so I can compare results deterministically. Are you experiencing any problem at your end?

Kryuski commented 2 years ago

I have no problem with that, just curious, thanks! But I'm having problems when the XorAndOr program generates many nodes with weights < 0. Try adding this line to TVolume.Randomize:

    if FData[I] > 0 then
      FData[I] := -FData[I];

Then the XorAndOr network will not be able to learn. P.S. Perhaps I should have made a separate issue for this, sorry.

joaopauloschuler commented 2 years ago

Negative weights should not be a problem. Are you able to send me the output please? I'm curious to see how the network is (or is not) converging.

Kryuski commented 2 years ago

Maybe the error is something else. This is result of (not) converging of original Githab version in Delphi:

Computing...
 Output: 0.00  0.00  0.10 - Training/Desired Output: 0.10  0.10  0.10
 Output: 0.00  0.00  0.00 - Training/Desired Output: 0.80  0.10  0.80
 Output: 0.00  0.00  0.80 - Training/Desired Output: 0.80  0.10  0.80
 Output: 0.00  0.00  0.80 - Training/Desired Output: 0.10  0.80  0.80
Layer  0                                                     Max Output:  0.900 Min Output:  0.900 TNNetInput 2,1,1 Times: 0.00s 0.00s
Layer  1 Neurons:  3 Max Weight:   1.113 Min Weight:  -1.095 Max Output:  0.996 Min Output:  0.000 TNNetFullConnectReLU 3,1,1 Times: 0.00s 0.00s Parent:0
Layer  2 Neurons:  3 Max Weight:   0.787 Min Weight:  -0.881 Max Output:  0.800 Min Output:  0.000 TNNetFullConnectReLU 3,1,1 Times: 0.00s 0.00s Parent:1
Layer  0 Max Error:    0.0000000 Min Error:    0.0000000 Max ErrorD:  0.000 Min ErrorD:  0.000 TNNetInput 2,1,1
Layer  1 Max Error:    0.0000000 Min Error:    0.0000000 Max ErrorD:  0.000 Min ErrorD:  0.000 TNNetFullConnectReLU 3,1,1 Parent:0
Layer  2 Max Error:    0.0000000 Min Error:    0.0000000 Max ErrorD:  0.000 Min ErrorD:  0.000 TNNetFullConnectReLU 3,1,1 Parent:1
Press ENTER to exit.

autosave.nn

-1)TNNetInput:2;1;1;0;0;0;0;0#0)TNNetFullConnectReLU:3;1;1;0;0;0;0;0#1)TNNetFullConnectReLU:3;1;1;0;0;0;0;0
!0]1;2;1;1;-1.09544515609741;-1.0268702507019[-0.0043096961453557]1;2;1;1;1.11267948150635;-0.00145174458157271[0]1;2;1;1;-0.497551202774048;0.375956773757935!0]1;3;1;1;-0.362800002098083;-0.676599979400635;-0.25560000538826[0]1;3;1;1;-0.148800000548363;-0.836000025272369;-0.0505999997258186[0.0158913228660822]1;3;1;1;-0.859000027179718;0.787039875984192;-0.880599975585938

autosave.csv - https://pastebin.com/cHA7W7cP

joaopauloschuler commented 2 years ago

Thank you for sharing. I'll investigate. I'm marking it as a bug.

Is this the example with or without neuralfit?

Kryuski commented 2 years ago

XorAndOr sample program uses TNeuralFit to train. NFit.Fit(NN, TrainingPairs, nil, nil, {batchsize=}4, {epochs=}3000);

joaopauloschuler commented 2 years ago

Cool. Thanks.

Just out of curiosity, have you ever given a go to Lazarus: https://www.lazarus-ide.org/

Kryuski commented 2 years ago

In TNNetFullConnectReLU.ComputeCPU(); I replaced

if Sum > 0
then FOutput.FData[Cnt] := Sum
else FOutput.FData[Cnt] := 0;

with FOutput.FData[Cnt] := Sum; After that, the network trained better:

 Output: 0.42 -0.60  0.26 - Training/Desired Output: 0.10  0.10  0.10
 Output: 0.43  0.10  0.62 - Training/Desired Output: 0.80  0.10  0.80
 Output: 0.44  0.10  0.62 - Training/Desired Output: 0.80  0.10  0.80
 Output: 0.45  0.80  0.98 - Training/Desired Output: 0.10  0.80  0.80
Kryuski commented 2 years ago

Yes, in Lazarus the sample program works as supposed.

joaopauloschuler commented 2 years ago

Many thanks for confirming. I'll debug.

joaopauloschuler commented 2 years ago

I'll update the example. As you pointed, ReLU doesn't work well here.

If you use this as input/output:

const inputs : TBackInput =
  ( // x1,   x2
    ( 0.1,  0.1), // False, False
    ( 0.1,  0.8), // False, True
    ( 0.8,  0.1), // True,  False
    ( 0.8,  0.8)  // True,  True
  );

const reluoutputs : TBackOutput =
  (// XOR, AND,   OR
    ( 0.1, 0.1, 0.1),
    ( 0.8, 0.1, 0.8),
    ( 0.8, 0.1, 0.8),
    ( 0.1, 0.8, 0.8)
  );

And then, update layers to:

    NN.AddLayer( TNNetFullConnect.Create(3) );
    NN.AddLayer( TNNetFullConnect.Create(3) );

It will work as expected with any initialization.

0.45 is the threshold. So, 0.44 is false and 0.45 is true.

Also, switch to 6000 epochs:

NFit.Fit(NN, TrainingPairs, nil, nil, {batchsize=}4, {epochs=}6000);

I'll update the example soon.

Kryuski commented 2 years ago

Yes, I confirm, thank you!

I added Randomize function and tested the algo with different initial weights. 5000 epochs will be enough.

Still strange, that in the log (autosave.csv) all important indicators (training accuracy, training loss etc.) are 0. Do you also see this?

Kryuski commented 2 years ago

Is this the example with or without neuralfit?

Now I understand what you meant when you asked about it. In the old repository there is SuperSimple example program that does the same as XorAndOr. Interestingly, SuperSimple is lightning fast compared to XorAndOr. Is neuralfit slowing things down?

I tried several machine learning libraries for Delphi I could find and your library is the clear winner. The fact that it has no external dependencies (no Python, no TensorFlow) is a big plus.

joaopauloschuler commented 2 years ago

SuperSimple is lightning fast compared to XorAndOr. Is neuralfit slowing things down?

You are correct. The extra overhead for multi-core hardware from neuralfit doesn't make sense for really small models. On the other hand, it makes a lot of sense to use neuralfit if you have large convolutions and large models with millions of trainable parameters. This is not a bug. This is actually expected.

The aspect that you are mentioning is actually so important that CAI has its own cross platform super efficient threading API: https://github.com/joaopauloschuler/neural-api/blob/master/neural/neuralthread.pas

Kryuski commented 2 years ago

Yes, I understand. Very impressive, thank you for your work and for supporting the Pascal/Delphi community!

joaopauloschuler commented 2 years ago

Regarding to "Still strange, that in the log (autosave.csv) all important indicators (training accuracy, training loss etc.) are 0. Do you also see this?", I'll check.

joaopauloschuler commented 2 years ago

Regarding "SuperSimple is lightning fast compared to XorAndOr. Is neuralfit slowing things down?", in most APIs, increasing the batch size speeds up the brute floating point computations per second with numerical impacts. https://www.sciencedirect.com/science/article/pii/S2405959519303455

The Xor example doesn't do very well with bigger batch sizes. Image classification tasks with plenty of classes do very well with big batch sizes.

joaopauloschuler commented 2 years ago

I believe that all questions in this thread have been solved. If you agree, I'll close this issue.