joaopauloschuler / neural-api

CAI NEURAL API - Pascal based deep learning neural network API optimized for AVX, AVX2 and AVX512 instruction sets plus OpenCL capable devices including AMD, Intel and NVIDIA.
GNU Lesser General Public License v2.1
371 stars 198 forks source link

Neuralthread - cannot compile for Delphi 10.4.2 for LINUX (popos/ubuntu 20.04) #56

Open laurenceliew opened 3 years ago

laurenceliew commented 3 years ago

Hi

The code runs perfectly under Win64. But I get an error on for Linux.

I am trying to compile the same SimpleMNIST demo for Linux (20.04 PopOS/Ubuntu).

[DCC Fatal Error] neuralthread.pas(66): F2613 Unit 'Windows' not found.

uses Classes, SysUtils, {$IFDEF FPC} fgl, MTPCPU {$IFDEF WINDOWS} ,windows {$ELSE} ,BaseUnix {$ENDIF} {$ELSE} Generics.Collections, Windows <---- E2613 {$ENDIF} , syncobjs;

joaopauloschuler commented 3 years ago

I regret to say that I don't have Delphi for Linux to test (I test with Lazarus). Do you know what is the equivalent unit with Delphi for Linux? If you propose a fix, I'll be glad to add.

laurenceliew commented 3 years ago

We can just comment it out, but other errors for the TRLCriticalSections which I think is not available in Delphi Linux pops up. I think we need to find the equivalent for Linux. I am really rusty with Delphi and just getting back... so can't be of help now with these errors.

image

joaopauloschuler commented 3 years ago

It seems that Delphi for Linux has other functions for mutex: https://www.embeddedcomputing.com/technology/software-and-os/thread-synchronization-in-linux-and-windows-systems-part-3

Would you like to give a go with Lazarus 64 for Linux? https://www.lazarus-ide.org/

Also, if you like, you can give a go via web browser via menu Runtime/Run all: https://colab.research.google.com/github/joaopauloschuler/neural-api/blob/master/examples/SimpleImageClassifier/SimpleImageClassifierCPU.ipynb

In the case that you decide to try any of the above, I'm curious to know how you go.

laurenceliew commented 3 years ago

OK - I have managed to get your code to compile and run - but it exit with

lliew@pop-os:~/PAServer/scratch-dir/lliew-Pop20/SimpleMNist$ ./SimpleMNist Creating Neural Network... File:train Labels:60000 Images:60000 Rows:28 Cols:28 File:t10k Labels:10000 Images:10000 Rows:28 Cols:28 Exception ESyncObjectException in module SimpleMNist at 000000000047D781. Named synchronization objects not supported on this platform.

Further investigation shows that your code is using "Named synchronization" - which Linux do not support. I assume your conditional compilation fix this issue with Lazarus on Linux.

The fix for neuralthread.pas is shown below. It will allow you code to compile (but not run successfully due to the above). The fix came from Log4D code.

uses Classes,Generics.Collections, // {$IFDEF FPC} // fgl, MTPCPU // {$IFDEF WINDOWS} // ,windows // {$ELSE} // ,BaseUnix // {$ENDIF} // {$ELSE} // Generics.Collections // {$ENDIF} // , syncobjs;

{$IFDEF LINUX64} Syncobjs, {$ELSE} Syncobjs, Windows, {$ENDIF} SysUtils;

type

{$IFDEF LINUX64} TRTLCriticalSection = TCriticalSection; {$ENDIF}

{$IFDEF LINUX64} procedure EnterCriticalSection(var CS: TCriticalSection); procedure LeaveCriticalSection(var CS: TCriticalSection); procedure InitializeCriticalSection(var CS: TCriticalSection); procedure DeleteCriticalSection(var CS: TCriticalSection); function GetCurrentThreadID: Integer; {$ENDIF}

implementation

{$IFDEF LINUX64} procedure EnterCriticalSection(var CS: TCriticalSection); begin CS.Enter; end;

procedure LeaveCriticalSection(var CS: TCriticalSection); begin CS.Leave; end;

procedure InitializeCriticalSection(var CS: TCriticalSection); begin CS := TCriticalSection.Create; end;

procedure DeleteCriticalSection(var CS: TCriticalSection); begin CS.Free; end;

function GetCurrentThreadID: Integer; begin Result := 0; end;

{$ENDIF}

laurenceliew commented 3 years ago

Ok - I found the code to make the thread non-named. The code (SimpleMNIST) is running now and seems to be working. I will tests other Demos and update later.

constructor TNeuralThread.Create(CreateSuspended: boolean; pIndex: integer); var NStartName, NFinishName: string; PidAndIndexStr: string; begin inherited Create(CreateSuspended); FProc := nil; FIndex := pIndex; FThreadNum := 1; FShouldStart := false; FProcFinished := false; PidAndIndexStr := IntToStr(GetProcessId())+'-'+IntToStr(pIndex); NStartName := 'NStart-'+PidAndIndexStr; NFinishName := 'NFinish-'+PidAndIndexStr; {$IFDEF FPC} FNeuronStart := TEventObject.Create(nil, True, False, NStartName) ; FNeuronFinish := TEventObject.Create(nil, True, False, NFinishName) ; {$ELSE} {$IFDEF LINUX64} FNeuronStart := TEvent.Create(nil, True, False, '') ; FNeuronFinish := TEvent.Create(nil, True, False,'') ; {$ELSE} FNeuronStart := TEvent.Create(nil, True, False, NStartName) ; FNeuronFinish := TEvent.Create(nil, True, False, NFinishName) ; {$ENDIF} {$ENDIF} end;

joaopauloschuler commented 3 years ago

Well done @laurenceliew ! You can either create a branch and then send a pull request or send the updated file as an attachment. If you prefer the branch/pull request, your name will be stored in the commit history.

laurenceliew commented 3 years ago

ok.. SimpleMNISt completed and seems to run fine. This is running in a VM with 2 cores and 4GB Ram. I will now try it on my PC (need to reinstall with Ubuntu)... and the rest of the Demos. This PC has a Nvidia GPU and will test the GPU acceleration also. If all is working... will then send over the changes.

image

joaopauloschuler commented 3 years ago

Cool. You'll need opencl drivers. It should be similar to this (changing version number): install nvidia-384 nvidia-opencl-icd-384 clinfo

Then, you'll need the ocl: apt-get install ocl-icd-opencl-dev

laurenceliew commented 3 years ago

Thank You. The info is helpful.

-- Laurence @ PostboxSG

On Tue, May 4, 2021, at 4:23 PM, joaopauloschuler wrote:

Cool. You'll need opencl drivers. It should be similar to this (changing version number): install nvidia-384 nvidia-opencl-icd-384 clinfo

Then, you'll need the ocl: apt-get install ocl-icd-opencl-dev

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/joaopauloschuler/neural-api/issues/56#issuecomment-831766984, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFOYECUB4IW7UIGHLWKFZTLTL6VHFANCNFSM437DCFLA.

joaopauloschuler commented 3 years ago

Most probably, you'll be able to find the current installed version with dpkg -L | greep nvidia

Then, you'll be able to install the compatible driver nvidia-opencl-icd-XXX .

To be sure that the installation worked, you may run "clinfo" and check if any available device shows up: http://manpages.ubuntu.com/manpages/xenial/man1/clinfo.1.html

laurenceliew commented 3 years ago

I can confirm SimpleMNIST works on Linux when compiled with Delphi Linux. This is running non openCL on a Ryzen 4700 (8c/16T).

image

image

joaopauloschuler commented 3 years ago

Love looking at your screenshots. This is my output from Lazarus without AVX and without OpenCL:

11520 Examples seen. Accuracy: 0.4015 Error: 0.57254 Loss: 0.57471 Threads: 4 Forward time: 0.38s Backward time: 0.38s Step time: 0.62s
12800 Examples seen. Accuracy: 0.4400 Error: 0.59346 Loss: 0.58933 Threads: 4 Forward time: 0.08s Backward time: 0.68s Step time: 0.62s
14080 Examples seen. Accuracy: 0.4786 Error: 0.37746 Loss: 0.39654 Threads: 4 Forward time: 0.32s Backward time: 0.44s Step time: 0.61s

With AVX2, this is what I get:

33280 Examples seen. Accuracy: 0.8179 Error: 0.22259 Loss: 0.31650 Threads: 4 Forward time: 0.39s Backward time: 0.17s Step time: 0.62s
34560 Examples seen. Accuracy: 0.8287 Error: 0.30109 Loss: 0.30324 Threads: 4 Forward time: 0.43s Backward time: 0.14s Step time: 0.61s

Your results make me wonder, are you running with debug info? This is my epoch time:

Epochs: 1 Examples seen:60000 Validation Accuracy: 0.9704 Validation Error: 0.0889 Validation Loss: 0.0951 Total time: 0.58min
Epoch time: 0.5500 minutes. 20 epochs: 0.1833 hours.
joaopauloschuler commented 3 years ago

On your hardware (with CPU only), I would expect from 6 to 8 epochs per minute. With GPU, it will be even faster. There is a bottleneck somewhere.

joaopauloschuler commented 3 years ago

Just thought about something: you can try to increase the batch size to 512 (you'll get some warning messages but you can disconsider):

NeuralFit.Fit(NN, ImgTrainingVolumes, ImgValidationVolumes, ImgTestVolumes, {NumClasses=}10, {batchsize=}512, {epochs=}20);

From memory, I saw some linux machines struggling with threading. Increasing the batch size will make the threading coordination less frequent.

laurenceliew commented 3 years ago

Hi @joaopauloschuler - haha yes I was in DEBUG mode. Switching to RELEASE. Screenshot from 2021-05-06 10-57-26

joaopauloschuler commented 3 years ago

It's better! I still think that you should try to change the batch size to 512.

laurenceliew commented 3 years ago

batch size of 512 is much slower. Anyways not able to get OpenCL to work. Downloaded your PasOpenCL, made some minor IFDEF.. compile and can run - but access violations.

laurenceliew commented 3 years ago

Hi @joaopauloschuler

attached are the 3 files which need some changes to make them work with Delphi Linux (and Windows) and also the SimpleMNIST made to work as a console program in Delphi Linux (and Windows)

TestNN.zip neural.zip

Cheers!

joaopauloschuler commented 3 years ago

With a bigger batch size, the step time will be bigger. In the plus side, the epoch time is likely to be smaller as there is less threading coordination. About your changes, it would be fantastic if you could fork, apply your changes, commit and then push. This would make easier for other users to see your change plus will add you as a contributor once your commits are merged.

joaopauloschuler commented 3 years ago

I have never tried MNIST with a GPU. Will give a go at my end and let you know.

laurenceliew commented 3 years ago

Hi @joaopauloschuler

Sorry for the miscommunication. For OpenCL - I was trying to run the SimpleImageClassiferGPU - not SimpleMNIST .