joaopauloschuler / neural-api

CAI NEURAL API - Pascal based deep learning neural network API optimized for AVX, AVX2 and AVX512 instruction sets plus OpenCL capable devices including AMD, Intel and NVIDIA.
GNU Lesser General Public License v2.1
356 stars 195 forks source link

Hypotenuse, Randomize and Delphi #79

Closed Kryuski closed 2 years ago

Kryuski commented 2 years ago

While studying the Hypotenuse example program in Delphi, I found out that it cannot train the neural network at all. After 50 epochs, NN returned 0 for any input parameters:

Inputs:98.00, 55.00 - Output: 0.00  Desired Output:112.38
Inputs:29.00, 91.00 - Output: 0.00  Desired Output:95.51
Inputs:96.00, 53.00 - Output: 0.00  Desired Output:109.66
Inputs:80.00, 94.00 - Output: 0.00  Desired Output:123.43
Inputs:97.00, 18.00 - Output: 0.00  Desired Output:98.66
Inputs:85.00, 66.00 - Output: 0.00  Desired Output:107.62
Inputs:47.00, 71.00 - Output: 0.00  Desired Output:85.15
Inputs:19.00, 96.00 - Output: 0.00  Desired Output:97.86
Inputs:68.00, 53.00 - Output: 0.00  Desired Output:86.21
Inputs:74.00, 14.00 - Output: 0.00  Desired Output:75.31

I tried to debug but without result. Finally I tried to add Randomize before creating the network and voila! Just after 10 epochs, a network with a good degree of approximation was built:

Inputs:98.00, 55.00 - Output:111.94  Desired Output:112.38
Inputs:29.00, 91.00 - Output:95.19  Desired Output:95.51
Inputs:96.00, 53.00 - Output:109.29  Desired Output:109.66
Inputs:80.00, 94.00 - Output:123.39  Desired Output:123.43
Inputs:97.00, 18.00 - Output:98.88  Desired Output:98.66
Inputs:85.00, 66.00 - Output:107.60  Desired Output:107.62
Inputs:47.00, 71.00 - Output:84.92  Desired Output:85.15
Inputs:19.00, 96.00 - Output:98.01  Desired Output:97.86
Inputs:68.00, 53.00 - Output:86.21  Desired Output:86.21
Inputs:74.00, 14.00 - Output:75.48  Desired Output:75.31

I don't know if this library is not fully compatible with Delphi, or the initial weights of the network did not allow it to be trained by this method. If you are interested, here are the initial parameters of the network, which cannot be trained:


And this is the log: autosave.csv

Kryuski commented 2 years ago

I have found out that the same "bad" NN breaks Lazarus compiled program as well. Only need to disable validation during training for this to work with Lazarus: NFit.Fit(NN, TrainingPairs, nil{ValidationPairs}, TestPairs, {batchsize=}32, {epochs=}50);

I tested this NN on two AMD Ryzen 5 3600 computers with the same result:

Inputs:19.00, 71.00 - Output: 0.00  Desired Output:73.50
Inputs:28.00,  6.00 - Output: 0.00  Desired Output:28.64
Inputs:34.00, 53.00 - Output: 0.00  Desired Output:62.97
Inputs:20.00, 35.00 - Output: 0.00  Desired Output:40.31
Inputs:53.00, 28.00 - Output: 0.00  Desired Output:59.94
Inputs:56.00, 12.00 - Output: 0.00  Desired Output:57.27
Inputs:77.00, 29.00 - Output: 0.00  Desired Output:82.28
Inputs:69.00, 31.00 - Output: 0.00  Desired Output:75.64
Inputs:24.00, 78.00 - Output: 0.00  Desired Output:81.61
Inputs:15.00, 22.00 - Output: 0.00  Desired Output:26.63

Now I think this is not related to Delphi, but to the processor. Because sometimes after processing the "bad" NN, subsequent tests of random NN also result in the same error. It would be nice to check the "bad" NN on Intel processors.

Kryuski commented 2 years ago

After further research, I think this has nothing to do with a compiler or a processor. This is a problem of the chosen type of layers (TNNetFullConnectReLU) and the activation function (FData > 0). Its result reaches the boundary values, and cannot get out of the local minimum.

joaopauloschuler commented 2 years ago

@Kryuski, many thanks for your detailed bug reports. I agree with your diagnostic. I'll fix these examples and their readme files in the next few days. You've been able to convince me to add randomize by default.

Kryuski commented 2 years ago

Well, if the NN trains well and shows the correct results, and only in corner cases does not converge, this is a good result. I think this issue can be closed.

joaopauloschuler commented 2 years ago

Is the "need to disable validation during training for this to work with Lazarus" still broken for you?