hshatti / TONNXRuntime

TOnnxRuntime is a Microsoft ONNXRuntime AI and Machine Learning Library for Freepascal / Delphi
MIT License
49 stars 10 forks source link

Problem with create CUDA provider #4

Closed CrytoGen closed 1 year ago

CrytoGen commented 1 year ago

Trying to create session with CUDA provider but get next error during debuggin by windbg "Exception OrtException in module Project1.exe at 000000000013F9D1. Code [1]: D:\a_work\1\s\include\onnxruntime\core/framework/provider_options_utils.h:41 onnxruntime::EnumToName [ONNXRuntimeError] : 1 : FAIL : provider_options_utils.h:33 onnxruntime::EnumToName Failed to map enum value to name: 7297107"

image

It work fine on CPU. I think UpdateCUDAProviderOptions work correct, because when i add keys with incorrect names (f.e. dev_id) - it show exception.

Copy of main part of source:

writeLn('Providers : ['+ Join(GetAvailableProviders())+']'); writeln(ansistring(Global.GetVersionString)); GetApi().CreateCUDAProviderOptions(@cpo); opts:=DefaultSessionOptions.Clone();

key:=[UTF8Encode('device_id'), UTF8Encode('gpu_mem_limit'), UTF8Encode('arena_extend_strategy'), UTF8Encode('cudnn_conv_algo_search'), UTF8Encode('do_copy_in_default_stream') // UTF8Encode('cudnn_conv_use_max_workspacecudnn_conv1d_pad_to_nc1d') ]; val:=[UTF8Encode('0'), UTF8Encode('2147483648'), UTF8Encode('kSameAsRequested'), UTF8Encode('DEFAULT'), UTF8Encode('1') // UTF8Encode('1') ]; WriteLn('1'); ThrowOnError(GetApi.UpdateCUDAProviderOptions(cpo,PPOrtChar(key),PPOrtChar(val),5));

WriteLn('2'); opts:=opts.AppendExecutionProvider_CUDA_V2(cpo^); WriteLn('3'); _path:='best.onnx'; session := TORTSession.Create(DefaultEnv,PORTCHAR_T(_path),opts); WriteLn('4');

I use Delphi 11.2

full source ``` program Project1; {$APPTYPE CONSOLE} {$R *.res} {$h+} uses Winapi.Windows, System.SysUtils, System.Classes, onnxruntime_pas_api, onnxruntime, System.Types, Vcl.Graphics, vcl.Imaging.pngimage, System.Diagnostics, System.Math; var session : TORTSession; opts : TORTSessionOptions; cpo : POrtCUDAProviderOptionsV2; inTensor : TORTTensor; inputs : TORTNameValueList; outputs : TORTNameValueList; outTensor : TORTTensor; mem : Pointer; mem2 : pCardinal; mx1,mx2 : Single; mxp1,mxp2: Integer; rect1,rect2,rect3:TRect; bmp : TBitmap; png : TPngImage; iv : Integer; function Join(const arr:array of ansistring; const Delimiter:ansistring=', '):ansistring; overload; var i:integer; begin result:=''; for i:=0 to high(arr) do result:=result+Delimiter+arr[i]; if length(result)>0 then delete(result,1,length(Delimiter)) end; function Join(const arr:array of int64_t; const Delimiter:ansistring=', '):ansistring; overload; var i:integer; begin result:=''; for i:=0 to high(arr) do result:=result+Delimiter+IntToStr(arr[i]); if length(result)>0 then delete(result,1,length(Delimiter)) end; function Join(const arr:array of double; const Delimiter:ansistring=', '):ansistring; overload; var i:integer; begin result:=''; for i:=0 to high(arr) do result:=result+Delimiter+FloatToStr(arr[i]); if length(result)>0 then delete(result,1,length(Delimiter)) end; var _path:widestring; key,val:array of RawByteString; {$POINTERMATH ON} begin WriteLn('Providers : ['+ Join(GetAvailableProviders())+']'); writeln(ansistring(Global.GetVersionString)); GetApi().CreateCUDAProviderOptions(@cpo); // opts:=DefaultSessionOptions.Clone(); ThrowOnError(Api.CreateSessionOptions(@opts.p_)); WriteLn(NativeUInt(opts.p_)); // opts.NewRef(); key:=[UTF8Encode('device_id'), UTF8Encode('gpu_mem_limit'), UTF8Encode('arena_extend_strategy'), UTF8Encode('cudnn_conv_algo_search'), UTF8Encode('do_copy_in_default_stream') // UTF8Encode('cudnn_conv_use_max_workspacecudnn_conv1d_pad_to_nc1d') ]; val:=[UTF8Encode('0'), UTF8Encode('2147483648'), UTF8Encode('kSameAsRequested'), UTF8Encode('DEFAULT'), UTF8Encode('1') // UTF8Encode('1') ]; WriteLn('1'); ThrowOnError(GetApi.UpdateCUDAProviderOptions(cpo,PPOrtChar(key),PPOrtChar(val),5)); WriteLn('2'); WriteLn(NativeUInt(opts.p_)); opts:=opts.AppendExecutionProvider_CUDA_V2(cpo^); WriteLn('3'); _path:='best.onnx'; WriteLn(NativeUInt(opts.p_)); session := TORTSession.Create(DefaultEnv,PORTCHAR_T(_path),opts); WriteLn('4'); inTensor := TORTTensor.Create([640,640,3,1]); var fs:=TFileStream.Create('00000036.raw',fmOpenRead); mem:=GetMemory(fs.Size); fs.Read(mem^,fs.Size); mem2:=GetMemory(640*640*4); for var y := 0 to 639 do for var x := 0 to 639 do begin var imx := Round(x/639*1023); var imy := Round(y/639*1023); iv:=PByte(mem)[imy*1024+imx]; var v := iv/255; inTensor.index4[x, y, 0, 0]:=v; inTensor.index4[x, y, 1, 0]:=v; inTensor.index4[x, y, 2, 0]:=v; mem2[x+y*640]:=iv*$00010101; end; bmp:=TBitmap.Create; bmp.PixelFormat:=pf32bit; bmp.Width:=640; bmp.Height:=640; SetBitmapBits(bmp.Handle,640*640*4,mem2); FreeMemory(mem2); FreeMemory(mem); fs.Free; inputs['images'] := inTensor; WriteLn('5'); var sw:=TStopwatch.Create; sw.Start; for var i := 0 to 20 do outputs := session.run(inputs); outTensor := outputs['output0']; sw.Stop; WriteLn('6'); WriteLn(sw.ElapsedMilliseconds); WriteLn('Results'); WriteLn(outTensor.shape[0],'*',outTensor.shape[1],'*',outTensor.shape[2]); mx1:=0; mx2:=0; mxp1:=-1; mxp2:=-1; for var i:=0 to outTensor.shape[1]-1 do begin if (outTensor.index3[0,i,4]>0.9) and (outTensor.index3[0,i,5]>mx1) then begin mx1:=outTensor.index3[0,i,5]; mxp1:=i; end; if (outTensor.index3[0,i,4]>0.9) and (outTensor.index3[0,i,5]>mx2) then begin mx2:=outTensor.index3[0,i,6]; mxp2:=i; end; // WriteLn(outTensor.index1[i]:1:5); end; if (mx1>0.95) and (mx2>0.95) then begin rect1:=Bounds( Round(outTensor.index3[0,mxp1,0]-outTensor.index3[0,mxp1,2]/2), Round(outTensor.index3[0,mxp1,1]-outTensor.index3[0,mxp1,3]/2), Round(outTensor.index3[0,mxp1,2]), Round(outTensor.index3[0,mxp1,3])); rect2:=Bounds( Round(outTensor.index3[0,mxp2,0]-outTensor.index3[0,mxp2,2]/2), Round(outTensor.index3[0,mxp2,1]-outTensor.index3[0,mxp2,3]/2), Round(outTensor.index3[0,mxp2,2]), Round(outTensor.index3[0,mxp2,3])); rect3:=TRect.Intersect(rect1,rect2); if rect3.Width*rect3.Height/(rect1.Width*rect1.Height)>0.85 then begin with bmp.Canvas do begin Pen.Color:=clLime; Brush.Style:=bsClear; Rectangle(rect3); end; for var j := 0 to outTensor.shape[0]-1 do Write(outTensor.index3[0,mxp1,j]:1:5,' '); WriteLn; end; WriteLn; end; bmp.SaveToFile('out.bmp'); bmp.Free; // for var i:=0 to 0 do // begin // for var j := 0 to outTensor.shape[0]-1 do // WriteLn(outTensor.index3[0,i,j]:1:5); // WriteLn; // end; WriteLn('The end'); readLn; try { TODO -oUser -cConsole Main : Insert code here } except on E: Exception do Writeln(E.ClassName, ': ', E.Message); end; end. ```

Trying in visual c++ - it's work ok.

c++ ``` #include #include #include "onnxruntime_cxx_api.h" #include "onnxruntime_c_api.h" int main() { Ort::Env env; Ort::SessionOptions session_options; auto api = Ort::GetApi(); OrtCUDAProviderOptionsV2* cuda_options = nullptr; Ort::ThrowOnError(api.CreateCUDAProviderOptions(&cuda_options)); std::vector keys{ "device_id"}; std::vector values{ "0"}; Ort::ThrowOnError(api.UpdateCUDAProviderOptions(cuda_options, keys.data(), values.data(), keys.size())); std::cout << "1\n"; session_options.AppendExecutionProvider_CUDA_V2(*cuda_options); std::cout << "2\n"; Ort::Session session = Ort::Session(env, L"best.onnx", session_options); std::cout << "3\n"; std::ifstream file("00000036.raw", std::ios::binary | std::ios::ate); std::streamsize size = file.tellg(); file.seekg(0, std::ios::beg); std::vector buffer(size); if (file.read(buffer.data(), size)) { } } ```
hshatti commented 1 year ago

Hi, thanks for spotting this, I'll try to get it to work with CUDA when i have time and will upload a working CUDA example shortly ( I dont have an NVIDIA GPU at the moment ) , however I just uploaded a Delphi GPU example YOLO_V7 based on DirectML re "git clone" the ripo, star the library if you find it useful it helps a lot .

Having a quick look at your code snippet you don't have to cast from ANSI to UTF8 (RowByteString) this can bring some unnecessary overhead and may corrupt the parameters, no need to clone "DefaultSessionOptions" try the following and see if it works, (test by keeping only "device_id" parameter first) :

var
  cpok,cpov : array of PORTChar;
  cpo : POrtCUDAProviderOptionsV2;
  // .. your other variables 
begin
 //.... your code
  setLength(cpok, 6);
  setLength(cpov, 6);

  cpok[0]:= 'device_id';
  cpok[1]:='gpu_mem_limit';
  cpok[2]:='arena_extend_strategy';
  cpok[3]:='cudnn_conv_algo_search';
  cpok[4]:='do_copy_in_default_stream';
  cpok[5]:='cudnn_conv_use_max_workspacecudnn_conv1d_pad_to_nc1d';

  cpov[0]:='0';
  cpov[1]:='2147483648';
  cpov[2]:='kSameAsRequested';
  cpov[3]:='DEFAULT';
  cpov[4]:='1';
  cpov[5]:='1';

  ThrowOnError(getapi().CreateCUDAProviderOptions(@cpo));
  ThrowOnError(getapi().UpdateCUDAProviderOptions(@cpo , @cpok[0]  ,@cpov[0] , length(cpok)));
  ThrowOnError(getapi().SessionOptionsAppendExecutionProvider_CUDA_V2(DefaultSessionOptions.p_ , @cpo));
// use simple inference form
  Session := TORTSession.create( modelPath );
// etc..

let me know how it works.

Cheers H

CrytoGen commented 1 year ago

Hi. Tried you variant with small corrections. cpok,cpot as array of PAnsiChar instead POrtChar (compiler don't allow to assign string constant to POrtChar). : procedure one; var cpok,cpov : array of PAnsiChar; cpo : POrtCUDAProviderOptionsV2; begin setLength(cpok, 1); setLength(cpov, 1);

cpok[0]:='device_id'; cpov[0]:='0';

WriteLn('1'); ThrowOnError(getapi().CreateCUDAProviderOptions(@cpo)); WriteLn('2'); ThrowOnError(getapi().UpdateCUDAProviderOptions(@cpo , @cpok[0] ,@cpov[0] , length(cpok))); WriteLn('3'); ThrowOnError(getapi().SessionOptionsAppendExecutionProvider_CUDAV2(DefaultSessionOptions.p , @cpo)); WriteLn('4'); // use simple inference form Session := TORTSession.create('best.onnx'); WriteLn('5'); end;

Next problem: image

I will try directml little bit later. Example source look like corrupted. Unit1.fmx present without Unit1.pas. I'm sorry. I don't use git and just download source. And don't know what is "star the library"...

CrytoGen commented 1 year ago

Sorry for splitting. I tried include DirectML from your example. And it works ) With DirectML: image On CPU: image Many thanks. For me it's enough. But if you want to try something with CUDA - i can keep this thread for dialogue.