(NPP) Shared Library loading issue on Linux

kunzmi / managedCuda

ManagedCUDA aims an easy integration of NVidia's CUDA in .net applications written in C#, Visual Basic or any other .net language.

Other

440 stars 79 forks source link

(NPP) Shared Library loading issue on Linux #116

Open hillin opened 1 year ago

hillin commented 1 year ago

My program complains that it can't find nppisu64_12 when running on Linux:

Unable to load shared library 'nppisu64_12' or one of its dependencies. In order to help diagnose loading problems, consider setting the LD_DEBUG environment variable: libnppisu64_12: cannot open shared object file: No such file or directory

The program is running inside a docker container, based on the nvidia/cuda:12.1.1-runtime-ubuntu20.04 image. I can find the libnppisu.so.12 file, but not libnppisu.so. Creating a link solved the problem.

So the question is, should ManagedCuda try to explicitly load libnppisu.so.12, instead of generally load nppisu? The same goes for other shared libraries.

Edit: fix file names, they don't have the 64 postfix in Linux.

kunzmi commented 1 year ago

Hi,

for the moment I can only check on an Ubuntu linux with Cuda 12.0 manually installed and there the corresponding symbolic link is present, i.e. libnppisu.so pointing to libnppisu.so.12 pointing to libnppisu.so.12.0.0.30.

Note that on linux the libs are named libnppisu.so* without the 64, which only appears on Windows. You see the 64 in the error message, because managedCuda falls back to the windows library name in case it doesn't find the linux variants.

You say that you created a link named libnppisu64.so with the 64, why I'm a bit confused what files were actually present and what files or links are missing.

Could you please post a ls of your lib folder and tell what was there from the beginning?

Cheers, Michael

hillin commented 1 year ago

Sorry, the 64 part was a typo, it's not there (main post updated). The problem should be the docker image does not contain those symbolic links.

kunzmi commented 1 year ago

I installed latest Cuda 12.1.1 on my Linux and all symbolic links are there. So I would consider this as a bug in the docker image and not a bug in ManagedCuda. I also prefer to keep the unversioned lib name in ManagedCuda as this would allow to use different Cuda versions as long as the used API calls keep the same. It also keeps maintenance a bit easier...

hillin commented 1 year ago

I'm just thinking, since the 12 part already has to be in the Windows DLL name, it might be easier to handle it in a unified way:

private static readonly HashSet<string> _nppLibraries = new {
    "nppc",
    "nppial",
    "nppicc",
    "nppidei",
    "nppif",
    "nppig",
    "nppim",
    "nppist",
    "nppisu",
    "nppitc",
    "npps"
};

private const string _libraryVersion = "12";

private static IntPtr ImportResolver(string libraryName, System.Reflection.Assembly assembly, DllImportSearchPath? searchPath)
{
    if(!_nppLibraries.Contains(libraryName))
    {
        return IntPtr.Zero;
    }

    string? libToLoad = null;
    if (RuntimeInformation.IsOSPlatform(OSPlatform.Linux))
    {
        libToLoad = $"lib{libraryName}.so.{_libraryVersion}";
    }
    else if (RuntimeInformation.IsOSPlatform(OSPlatform.Windows))
    {
        libToLoad = $"{libraryName}64_{_libraryVersion}.dll";
    }

    // ...

This also prevents accidental reference of incorrect version of libraries, if that matters.

Anyways it's a simple fix in the Dockerfile if we keep it as is.

kunzmi commented 1 year ago

Do you know if all cuda docker images are concerned or is it only the latest Cuda 12.1 one? If for some reason all docker images of any version don't contain all necessary symbolic links, one might consider adapting ManagedCuda. If it is only this specific version that has the missing files, I'd keep it as is.

Cheers, Michael