joaopauloschuler / neural-api

CAI NEURAL API - Pascal based deep learning neural network API optimized for AVX, AVX2 and AVX512 instruction sets plus OpenCL capable devices including AMD, Intel and NVIDIA.
GNU Lesser General Public License v2.1
371 stars 198 forks source link

OpenCL fails with 2 fully connected layers #105

Open joaopauloschuler opened 1 year ago

joaopauloschuler commented 1 year ago

I'm having problem to adapt one of my program using CAI with OpenCL

I've tested "SimpleImageClassifierGPU" and it's working on my computer (removing the option -dAVX, because my CPU is old)

When I try to add OpenCL in my test program with 2 fully connected layers (no convolution, it fails).

Dzandaa commented 1 year ago

When calling:

NeuralFit.Fit(NeuralNet, TrainingPairs, nil, nil, {batchsize=}4, {epochs=}SEEpoch.Value);

Without neural.cl in same directory as the executable: From line 584 of neuralopencl.pas 'File neural.cl could not be found.'

With neural.cl in same directory as the executable: From line 948 of neuralopencl.pas 'clCreateContext OK!' then crash...

I think that perhaps one of the problem is that my directory 'neural' is not in '../../../neural' but in ../neural

Dzandaa commented 1 year ago

I just try same program on a Linux Mint 20.2

I add "-dUseCThreads" in Custom Options and change

{$IFDEF UseCThreads} cthreads, cmem, {$ENDIF}

to

{$IFDEF UseCThreads} cthreads, {$ENDIF}

It works, but I don't see any acceleration. for 1000 epoch: With OpenCl enabled and AVX : 34.62 Seconds Without OpenCL and With AVX: 33.82 Seconds Without OpenCL and Without AVX: 62.76 Seconds With OpenCL and Without AVX: 62.92 Seconds

joaopauloschuler commented 1 year ago

OpenCL is actually slower in this experiment. I'm wondering if the number of weights/neurons is so small in this experiment that OpenCL has no advantage.

Dzandaa commented 1 year ago

I don't know why it crashes on Windows after clCreateContext OK!

joaopauloschuler commented 1 year ago

I'm about to start working on this.

joaopauloschuler commented 1 year ago

On dense (fully connected layers), OpenCL is called only when there is enough neurons/weights to compensate the overhead that it adds:

FShouldOpenCL := (FNeurons.Count >= 512) and (pPrevLayer.Output.Size >= 128);

Depending on how many neurons you have on each layer, maybe its not even in use.

joaopauloschuler commented 1 year ago

I've just tested the following and it works for me:

program Hypotenuse;
(*
Hypotenuse: learns how to calculate hypotenuse sqrt(X^2 + Y^2).
Copyright (C) 2019 Joao Paulo Schwarz Schuler

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
*)

{$mode objfpc}{$H+}

uses {$IFDEF UNIX} {$IFDEF UseCThreads}
  cthreads, {$ENDIF} {$ENDIF}
  Classes,
  neuralnetwork,
  neuralvolume,
  neuralfit,
  neuralopencl;

  function CreateHypotenusePairList(MaxCnt: integer): TNNetVolumePairList;
  var
    Cnt: integer;
    LocalX, LocalY, Hypotenuse: TNeuralFloat;
  begin
    Result := TNNetVolumePairList.Create();
    for Cnt := 1 to MaxCnt do
    begin
      LocalX := Random(100);
      LocalY := Random(100);
      Hypotenuse := sqrt(LocalX*LocalX + LocalY*LocalY);

      Result.Add(
        TNNetVolumePair.Create(
          TNNetVolume.Create([LocalX, LocalY]),
          TNNetVolume.Create([Hypotenuse])
        )
      );
    end;
  end;

  // Returns TRUE if difference is smaller than 0.1 .
  function LocalFloatCompare(A, B: TNNetVolume; ThreadId: integer): boolean;
  begin
    Result := ( Abs(A.FData[0]-B.FData[0])<0.1 );
  end;

  procedure RunAlgo();
  var
    NN: TNNet;
    NFit: TNeuralFit;
    TrainingPairs, ValidationPairs, TestPairs: TNNetVolumePairList;
    Cnt: integer;
    pOutPut: TNNetVolume;
    EasyOpenCL: TEasyOpenCL;
  begin
    NN := TNNet.Create();
    NFit := TNeuralFit.Create();
    TrainingPairs := CreateHypotenusePairList(10000);
    ValidationPairs := CreateHypotenusePairList(1000);
    TestPairs := CreateHypotenusePairList(1000);

    NN.AddLayer([
      TNNetInput.Create(2),
      TNNetFullConnectReLU.Create(512),
      TNNetFullConnectReLU.Create(512),
      TNNetFullConnectLinear.Create(1)
    ]);

    EasyOpenCL := TEasyOpenCL.Create();
    if EasyOpenCL.GetPlatformCount() = 0 then
    begin
      WriteLn('No OpenCL capable platform has been found.');
      exit;
    end;
    WriteLn('Setting platform to: ', EasyOpenCL.PlatformNames[0]);
    EasyOpenCL.SetCurrentPlatform(EasyOpenCL.PlatformIds[0]);
    if EasyOpenCL.GetDeviceCount() = 0 then
    begin
      WriteLn('No OpenCL capable device has been found for platform ',EasyOpenCL.PlatformNames[0]);
      exit;
    end;
    EasyOpenCL.SetCurrentDevice(EasyOpenCL.Devices[0]);

    NFit.EnableOpenCL(EasyOpenCL.PlatformIds[0], EasyOpenCL.Devices[0]);

    WriteLn('Computing...');
    NFit.InitialLearningRate := 0.00001;
    NFit.LearningRateDecay := 0;
    NFit.L2Decay := 0;
    NFit.InferHitFn := @LocalFloatCompare;
    NFit.MaxThreadNum := 1;
    NFit.Fit(NN, TrainingPairs, ValidationPairs, TestPairs, {batchsize=}32, {epochs=}50);
    NN.DebugWeights();

    pOutPut := TNNetVolume.Create({pSizeX=}1, {pSizeY=}1, {pDepth=}1, {FillValue=}1);

    // tests the learning
    for Cnt := 0 to 9 do
    begin
      NN.Compute(TestPairs[Cnt].I);
      NN.GetOutput(pOutPut);
      WriteLn
      ( 'Inputs:',
        TestPairs[Cnt].I.FData[0]:5:2,', ',
        TestPairs[Cnt].I.FData[1]:5:2,' - ',
        'Output:',
        pOutPut.Raw[0]:5:2,' ',
        ' Desired Output:',
        TestPairs[Cnt].O.FData[0]:5:2
      );
    end;

    EasyOpenCL.Free;
    pOutPut.Free;
    TestPairs.Free;
    ValidationPairs.Free;
    TrainingPairs.Free;
    NFit.Free;
    NN.Free;
    Write('Press ENTER to exit.');
    ReadLn;
  end;

var
  // Stops Lazarus errors
  Application: record Title:string; end;

begin
  Application.Title:='Hypotenuse Example';
  RunAlgo();
end.
joaopauloschuler commented 1 year ago

I've just tested the following and it also works for me:

    //NFit.MaxThreadNum := 1;
    NFit.Fit(NN, TrainingPairs, nil, nil, {batchsize=}32, {epochs=}50);

and

    NN.AddLayer([
      TNNetInput.Create(2),
      TNNetFullConnectReLU.Create(512),
      TNNetFullConnectReLU.Create(512),
      TNNetFullConnectReLU.Create(512),
      TNNetFullConnectLinear.Create(1)
    ]);

Given that I can't reproduce, you'll need to share a full Lazarus project source code that provokes the error.

joaopauloschuler commented 1 year ago

In the case that it helps, this is how neural.cl is loaded:

constructor TNeuralKernel.Create(pCurrentPlatform: cl_platform_id;
  pCurrentDevice: cl_device_id; kernelname: string = 'cai_dot_product');
begin
  inherited Create();
  SetCurrentPlatform(pCurrentPlatform);
  SetCurrentDevice(pCurrentDevice);

  // Create the OpenCL Kernel Here:
  if FileExists('../../../neural/neural.cl') then
  begin
    CompileProgramFromFile('../../../neural/neural.cl');
  end
  else if FileExists('neural.cl') then
  begin
    CompileProgramFromFile('neural.cl');
  end
  else
  begin
    MessageProc('File neural.cl could not be found.');
  end;
  PrepareKernel(kernelname);
end; 
Dzandaa commented 1 year ago

Hi,

Thank you very much for your tests :) Here is my little test program.

You have to change the path of /neural and add neural.cl in the same directory as the executable.

NetSpectrum.zip

Just train (500-100 epoch) and test

B->