kleisauke / net-vips

.NET binding for libvips.
https://kleisauke.github.io/net-vips/
MIT License
383 stars 32 forks source link

Resolution of NetVips.Native.linux-musl-arm64 fails #233

Open zotanmew opened 2 months ago

zotanmew commented 2 months ago

This is plausibly a bug in the .NET SDK, but I wanted to open an issue here as well:

Referencing NetVips.Native when building a linux-musl-arm64 package results in the linux-arm64 .so file ending up in the output directory. I've tested various ways of fixing this problem (since I initially assumed it had to do with the differing netstandard1.0/2.0/2.1 targetframeworks), and the only solution that I've found that worked both for musl-arm64 as well as other RIDs was to modify NetVips.Native.nuspec in a way that has it contain all the native binaries directly, instead of depending on them.

The one issue with this method is that explicitly depending on a specific architecture package is no longer possible, which is why I didn't open a PR with these changes, but maybe there's a different way of solving this I'm not seeing.

Example:

<?xml version="1.0" encoding="utf-8"?>
<package xmlns="http://schemas.microsoft.com/packaging/2013/01/nuspec.xsd">
  <metadata>
    <id>NetVips.Native</id>
    <title>NetVips - Native binaries</title>
    <version>$version$</version>
    <description>This package complements the NetVips package and contains native binaries of libvips</description>
    <summary>Native binaries of libvips</summary>
    <projectUrl>https://kleisauke.github.io/net-vips</projectUrl>
    <repository type="git" url="https://github.com/kleisauke/net-vips" />
    <tags>libvips binaries image-processing</tags>
    <license type="expression">MIT</license>
    <authors>Kleis Auke Wolthuizen</authors>
    <owners>Kleis Auke Wolthuizen</owners>
    <requireLicenseAcceptance>false</requireLicenseAcceptance>
    <copyright>Kleis Auke Wolthuizen</copyright>
  </metadata>
  <files>
    <file src="pack\linux-x64\*.so*" target="runtimes/linux-x64/native" />
    <file src="pack\linux-arm\*.so*" target="runtimes/linux-arm/native" />
    <file src="pack\linux-arm64\*.so*" target="runtimes/linux-arm64/native" />
    <file src="pack\linux-musl-x64\*.so*" target="runtimes/linux-musl-x64/native" />
    <file src="pack\linux-musl-arm64\*.so*" target="runtimes/linux-musl-arm64/native" />
    <file src="pack\osx-x64\*.dylib" target="runtimes/osx-x64/native" />
    <file src="pack\osx-arm64\*.dylib" target="runtimes/osx-arm64/native" />
    <file src="pack\win-x86\*.dll" target="runtimes/win-x86/native" />
    <file src="pack\win-x64\*.dll" target="runtimes/win-x64/native" />
    <file src="pack\win-arm64\*.dll" target="runtimes/win-arm64/native" />

    <file src="pack\linux-x64\THIRD-PARTY-NOTICES.md" />
    <file src="pack\linux-x64\versions.json" />

    <!-- A dummy reference which prevents NuGet from adding any compilation references when this package is imported -->
    <file src="_._" target="lib/netstandard2.1" />
  </files>
</package>
kleisauke commented 2 months ago

Sounds like a similar issue to #186. I suspect that the Runtime Identifier (RID) is incorrectly set, causing the wrong library to be referenced. Could you provide the information of the following commands?

$ dotnet --info
$ echo $DOTNET_RUNTIME_ID

(the result of the second command is supposed to be empty, you can override the detected RID with that environment variable, so in most cases it's not set)

As an aside, .NET 8.0 resolves RIDs much more effectively: https://learn.microsoft.com/en-us/dotnet/core/compatibility/deployment/8.0/rid-asset-list

zotanmew commented 2 months ago

I'm afraid this is happening in a .net 8.0 application, with UseCurrentRuntimeIdentifier set to true in the .csproj.

zotanmew commented 2 months ago

To answer your questions, though:

> docker run --rm -it --entrypoint sh mcr.microsoft.com/dotnet/sdk:8.0-alpine
/ # dotnet --info
.NET SDK:
 Version:           8.0.201
 Commit:            4c2d78f037
 Workload version:  8.0.200-manifests.3097af8b

Runtime Environment:
 OS Name:     alpine
 OS Version:  3.19
 OS Platform: Linux
 RID:         linux-musl-arm64
 Base Path:   /usr/share/dotnet/sdk/8.0.201/

.NET workloads installed:
There are no installed workloads to display.

Host:
  Version:      8.0.2
  Architecture: arm64
  Commit:       1381d5ebd2

.NET SDKs installed:
  8.0.201 [/usr/share/dotnet/sdk]

.NET runtimes installed:
  Microsoft.AspNetCore.App 8.0.2 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 8.0.2 [/usr/share/dotnet/shared/Microsoft.NETCore.App]

Other architectures found:
  None

Environment variables:
  Not set

global.json file:
  Not found

Learn more:
  https://aka.ms/dotnet/info

Download .NET:
  https://aka.ms/dotnet/download

And echo $DOTNET_RUNTIME_ID returns nothing.

kleisauke commented 2 months ago

The RID seems to be correctly set in the output, so passing -e DOTNET_RUNTIME_ID=linux-musl-arm64 to the Docker invocation likely won't make any difference.

I'll attempt to reproduce this issue later. In the meantime, can you try reproducing it with the latest mcr.microsoft.com/dotnet/sdk:8.0-alpine image (currently .NET 8.0.6)? Additionally, they provide Alpine 3.20 images under the 8.0-alpine3.20 tag, which might also be worth checking to see if the issue persists.

Details ```console $ docker run --platform linux/arm64 mcr.microsoft.com/dotnet/sdk:8.0.201-alpine3.19 dotnet --info > old.txt $ docker run --platform linux/arm64 mcr.microsoft.com/dotnet/sdk:8.0-alpine dotnet --info > new.txt $ git diff --no-index old.txt new.txt ``` ```diff @@ -1,29 +1,30 @@ .NET SDK: - Version: 8.0.201 - Commit: 4c2d78f037 - Workload version: 8.0.200-manifests.3097af8b + Version: 8.0.302 + Commit: ef14e02af8 + Workload version: 8.0.300-manifests.f6879a9a + MSBuild version: 17.10.4+10fbfbf2e Runtime Environment: OS Name: alpine OS Version: 3.19 OS Platform: Linux RID: linux-musl-arm64 - Base Path: /usr/share/dotnet/sdk/8.0.201/ + Base Path: /usr/share/dotnet/sdk/8.0.302/ .NET workloads installed: There are no installed workloads to display. Host: - Version: 8.0.2 + Version: 8.0.6 Architecture: arm64 - Commit: 1381d5ebd2 + Commit: 3b8b000a0e .NET SDKs installed: - 8.0.201 [/usr/share/dotnet/sdk] + 8.0.302 [/usr/share/dotnet/sdk] .NET runtimes installed: - Microsoft.AspNetCore.App 8.0.2 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App] - Microsoft.NETCore.App 8.0.2 [/usr/share/dotnet/shared/Microsoft.NETCore.App] + Microsoft.AspNetCore.App 8.0.6 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App] + Microsoft.NETCore.App 8.0.6 [/usr/share/dotnet/shared/Microsoft.NETCore.App] Other architectures found: None ```
zotanmew commented 2 months ago

Reproduced with the latest 8.0-alpine image (--info says sdk 8.0.6), as well as with 8.0-alpine3.20. Should I create a minimal csproj/sln/dockerfile that reproduces the issue? I can give you the full project too but it's a rather large open source project ^^

zotanmew commented 2 months ago

Here's a simple reproduction example, this builds the project and then checks the sha256sum of the .so when the container starts & prints which one it found - https://github.com/zotanmew/dotnet-dependency-resolution

zotanmew commented 2 months ago

Using this, I can confirm that 9.0-preview-alpine exhibits the same behavior.

kleisauke commented 2 months ago

I could reproduce this issue even after making the following changes:

--- a/dotnet-dependency-resolution.csproj
+++ b/dotnet-dependency-resolution.csproj
@@ -9,7 +9,8 @@
   </PropertyGroup>

   <ItemGroup>
-    <PackageReference Include="NetVips.Native" Version="8.15.2" />
+    <PackageReference Include="NetVips.Native.linux-arm64" Version="8.15.2" />
+    <PackageReference Include="NetVips.Native.linux-musl-arm64" Version="8.15.2" />
   </ItemGroup>

 </Project>
$ docker build --platform linux/arm64 -t test-musl-arm64 .
$ docker run --rm test-musl-arm64
arm64-glibc libvips.so.42 detected
$ docker run -it --rm --entrypoint sh test-musl-arm64
/app $ grep -A4 '"native": {' dotnet-dependency-resolution.deps.json
        "native": {
          "runtimes/linux-arm64/native/libvips.so.42": {
            "fileVersion": "0.0.0.0"
          }
        }

So, it looks there is some sort of bug in the .NET SDK's RID resolution.

FWIW, I considered using the undocumented runtime.json feature, similar to the libclang and Microsoft.NETCore.DotNetAppHost packages. However, this approach is not feasible if you target multiple RIDs, see: https://github.com/dotnet/sdk/issues/33845#issuecomment-1625243199

zotanmew commented 2 months ago

Very strange indeed. What confuses me the most is that it works fine if you put all the directories into one file. We're temporarily maintaining a version of the native nuget package that uses the modified nuspec I mentioned above (https://iceshrimp.dev/iceshrimp/-/packages/nuget/netvips.native/8.15.2-iceshrimp), but getting this solved upstream would be ideal. As far as I understand it there's not even a difference in terms of download size with this model, since it figures out which package to include in the build after downloading all the packages to the project/nuget cache. I'd like to report this to the .NET team, but have no idea how to phrase the bug or which repo to report it in (dotnet/sdk maybe?). Anyhow, thank you for the help with debugging the cause.

kleisauke commented 2 months ago

Reported upstream at https://github.com/dotnet/sdk/issues/4195#issuecomment-2180618086.