microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
13.97k stars 2.82k forks source link

Error during onnxruntime-gpu nuget package installation #6425

Open rligocki opened 3 years ago

rligocki commented 3 years ago

Describe the bug During installation of onnxruntime-gpu (custom build for L4T, Jetson Nano) nuget package, this error appear:

user@user:~/Project/YOLOv4MLNet$ dotnet add package Microsoft.ML.OnnxRuntime.Gpu -v 1.5.2-dev-20210113-0851-ad7cc541f -s /home/user/project/ Determining projects to restore... Writing /tmp/tmpC3BkVe.tmp info : Adding PackageReference for package 'Microsoft.ML.OnnxRuntime.Gpu' into project '/home/user/project/YOLOv4MLNet/YOLOv4MLNet.csproj'. info : Restoring packages for /home/user/Project/YOLOv4MLNet/YOLOv4MLNet.csproj... info : GET https://api.nuget.org/v3-flatcontainer/microsoft.ml.onnxruntime.managed/index.json info : OK https://api.nuget.org/v3-flatcontainer/microsoft.ml.onnxruntime.managed/index.json 820ms info : GET https://api.nuget.org/v3-flatcontainer/microsoft.aspnetcore.app.ref/index.json info : GET https://api.nuget.org/v3-flatcontainer/microsoft.netcore.app.host.linux-arm64/index.json info : GET https://api.nuget.org/v3-flatcontainer/microsoft.netcore.app.ref/index.json info : OK https://api.nuget.org/v3-flatcontainer/microsoft.aspnetcore.app.ref/index.json 201ms info : OK https://api.nuget.org/v3-flatcontainer/microsoft.netcore.app.ref/index.json 295ms info : OK https://api.nuget.org/v3-flatcontainer/microsoft.netcore.app.host.linux-arm64/index.json 693ms info : Installing Microsoft.NETCore.App.Ref 3.1.0. info : GET https://api.nuget.org/v3-flatcontainer/microsoft.netcore.app.ref/3.1.0/microsoft.netcore.app.ref.3.1.0.nupkg info : OK https://api.nuget.org/v3-flatcontainer/microsoft.netcore.app.ref/3.1.0/microsoft.netcore.app.ref.3.1.0.nupkg 19ms info : Installing Microsoft.NETCore.App.Host.linux-arm64 3.1.11. info : GET https://api.nuget.org/v3-flatcontainer/microsoft.netcore.app.host.linux-arm64/3.1.11/microsoft.netcore.app.host.linux-arm64.3.1.11.nupkg info : Installing Microsoft.AspNetCore.App.Ref 3.1.10. info : GET https://api.nuget.org/v3-flatcontainer/microsoft.aspnetcore.app.ref/3.1.10/microsoft.aspnetcore.app.ref.3.1.10.nupkg info : Installing Microsoft.ML.OnnxRuntime.Gpu 1.5.2-dev-20210113-0851-ad7cc541f. info : OK https://api.nuget.org/v3-flatcontainer/microsoft.aspnetcore.app.ref/3.1.10/microsoft.aspnetcore.app.ref.3.1.10.nupkg 40ms info : OK https://api.nuget.org/v3-flatcontainer/microsoft.netcore.app.host.linux-arm64/3.1.11/microsoft.netcore.app.host.linux-arm64.3.1.11.nupkg 152ms error: Access to the path '/home/user/.nuget/packages/microsoft.ml.onnxruntime.gpu/1.5.2-dev-20210113-0851-ad7cc541f/runtimes/linux-x64/native' is denied. error: Permission denied

Already tried to delete whole .nuget directory, change ownership, change permissions or run dotnet as sudo

Urgency It is blocker for private project

System information

To Reproduce

Expected behavior After compilation and installation of nuget package onnxruntime-gpu, it should be possible to run YOLOv5 on C#

snnn commented 3 years ago

"error: Access to the path '/home/user/.nuget/packages/microsoft.ml.onnxruntime.gpu/1.5.2-dev-20210113-0851-ad7cc541f/runtimes/linux-x64/native' is denied. error: Permission denied"

I think it is because the /home/user/.nuget is not owned by you. Please check the folder's permissions.

rligocki commented 3 years ago

I already checked permission. Whole folder was owned by my profile. Also tried to remove whole folder. At last I tried to run dotnet and nuget as root. Still the same error.

snnn commented 3 years ago

I'll try to see if I can reproduce the same error on Ubuntu x64 Linux. I don't have access to Jetson .

mrry commented 3 years ago

@sarah-widder and I have been investigating a similar problem with a custom NuGet package. TL;DR: It was not a permissions error. The (temporary) solution involved commenting out the following lines in the nuspec generation script: https://github.com/microsoft/onnxruntime/blob/844361bc67dcd149987290ec326dc492376190fb/tools/nuget/generate_nuspec_for_native_nuget.py#L383-L395

From what we could tell, the dotnet add package command was failing when it processed these lines in the nuspec file:

<file src="/repo/build/Release/onnxruntime_perf_test" target="runtimes\linux-x64\native" />
<file src="/repo/build/Release/onnx_test_runner" target="runtimes\linux-x64\native" />

Looking at strace -f output, we discovered that (for some unknown reason), the process was trying to creat a file at the "runtimes/linux-x64/native" path, which already existed as a directory:

[pid  9017] creat("/.../.nuget/packages/microsoft.ml.onnxruntime.gpu/1.6.0-dev-20210406-1148-9317f75a1/runtimes/linux-x64/native", 0766 <unfinished ...>
[pid  9017] <... creat resumed>)        = -1 EISDIR (Is a directory)

At a guess, this could be a NuGet bug when attempting to install a file that doesn't exist in the source package, and the EISDIR is transformed into a permission error somewhere in dotnet? It seems to affect a lot of people, based on a search for the error, although there doesn't seem to be a standard solution.

Thinking this through, there are a couple of possibly-better alternative approaches to fixing your problem:

In addition, we should probably see what we can to do improve the error message in NuGet once the root cause is understood.

mrry commented 3 years ago

@snnn Is it possible that the problem stems from these lines in the csproj? https://github.com/microsoft/onnxruntime/blob/8d737f977056444a307f1b7f0bcd402fba62d790/csharp/src/Microsoft.ML.OnnxRuntime/Microsoft.ML.OnnxRuntime.csproj#L155-L168

If I'm understanding correctly (and I'm not an expert in this area, so could be wrong!), it looks like the csproj is testing for the existence of files ending in a ".exe" extension (so will match on Windows only), but the nuget_dependencies for a Linux build in generate_nuspec_for_native_nuget.py is testing for the existence of binaries without the ".exe" extension: https://github.com/microsoft/onnxruntime/blob/8d737f977056444a307f1b7f0bcd402fba62d790/tools/nuget/generate_nuspec_for_native_nuget.py#L204-L214

Perhaps this inconsistency leads to a nuspec file that refers to files that are expected to be in the package, but not actually copied by the directives in the csproj? And then NuGet itself gets confused if the files aren't there?

snnn commented 3 years ago

I feel you are right. Let me try to understand this script...