ikvmnet / ikvm

A Java Virtual Machine and Bytecode-to-IL Converter for .NET
Other
1.17k stars 110 forks source link

libnet and libnio #409

Closed wasabii closed 10 months ago

wasabii commented 11 months ago

Integrate libnet and libnio projects, taking OpenJDK C files. We've got a bunch of stuff included in these, and a bunch of stuff still excluded. But over time we can enable more and more of it.

This PR also removes our forked copies of Net.java and IOUtil.java, since the standard versions can now be called through libnio.

Introduced IKVM.Image.runtime.* packages for each target RID. Native libraries now get packaged into these, instead of the main IKVM.Image project. The targets file has changed a bit to support Pack operations (used intree). PublishProjectReference was cleaned up to output to IkvmImageItem.

Native libraries moved into the Image: under ikvm/bin. As they would be in a real JVM. IKVM sun.boot library path set to resolve from ikvm.home, out of the image. User path set to resolve out of .NET standard locations (app dir, runtimes/*). This puts user libraries in the proper path.

Removed everything from the built-in library set. On IKVM nothing is statically compiled. Introduced an empty libawt to satisfy that.

libjvm needs to be preloaded by the JNI stuff to set up callbacks.

AaronRobinsonMSFT commented 11 months ago

Use LoadLibraryExW explicitely. NativeLibrary from .NET Core just complicated things needlessly between Framework/Core.

@wasabii Can you share some of the complexity being observed here? I think we can all appreciate the inexact semantic behavior between the two approaches, but falling back to LoadLibraryEx is really tricky to get right from a security perspective. Perhaps we can help provide the guidance to do the "right thing".

/cc @elinor-fung @jkotas

wasabii commented 11 months ago

@AaronRobinsonMSFT

Yeah. We're trying to implement the exact semantics of OpenJDK on the OS we're running on. It's not that we can't get this working by going through .NET Core's NativeLibrary implementation by being careful: it's that it's an additional level of indirection to think through. One which we cannot use on Framework anyways. For instance, passing an Assembly, and it resolving from the AssemblyContext ResolveUnmanagedDll. Or how it implements it's own cache. Or how it (in nativelibrary.cpp) tries to do the DllImport logic of appending .so and .dll, and prepending lib, etc. OpenJDK already contains their own version of all that logic (written in Java, in ClassLoader), and makes it's own determiniation of when to pass through the libname unmolested, etc.

For instance, in tracking down the libjava.dll issue I linked from that commit, I got misdirected a number of times by assuming the fact that Framework and Core functioned differently were because of the additional logic in SRIS.NativeLibrary. It wasn't until I removed SRIS.NativeLibrary until I realized it was a deeper issue with how Win32 LoadLibrary itself functioned. I spent a day going over the .NET Core NativeLibrary code, trying to spot why it was failing in this particular case.... maybe it was changing the path. Maybe it was passing some unexpected option... etc.

If we've introduced a security issue, as long as it's the exact same security issue present in OpenJDK when running on Windows, we're doing the right thing.

So, it's not that we can't use it, and get it to work. It's that there is no point. It only makes it harder, not easier.

wasabii commented 10 months ago

@AaronRobinsonMSFT @elinor-fung @jkotas

Okay. So, this gets more complicated with OS X. And more justifies my approach, I think.

So, OS X has a much more 'advanced' system of dynamic loading search paths than either Windows or Linux. dl maintains a 'stack' of search paths. Where each library that you load is able to append its own entries, which are then usable for resolving dependencies of that library. This is done with LC_RPATH commands in the dylib.

Additionally, .dylib files can have a LCID, a unique "install name". During linking, this install name is copied into the dependent libraries, as their location. Without this, the full or relative path of the dylib is used.

So, in this dependency hierarchy: libnet.dylib -> libjvm.dylib. libjvm.dylib is built first. libnet is linked to it, and the full path given to lld64 is encoded into libnet.dylib. Meaning libnet.dylib does not search libjvm.dylib by any paths. It just uses that encoded path. If you aren't careful, this will not work at runtime, since the path during linking a complex project is probably wrong.

Install names can have a few variables. @rpath, @loader_path and @executable_path. For instance, the install name of libjvm could be @loader_path/. Which means when libnet is linked against to it, libnet would be encoded with @loader_path/./libjvm.dylib as the search path for libjvm. That is, libnet would ONLY look for libjvm right next to it, since that's what @loader_path refers to (the path of the depending library).

Basically, unlike linux, the shared libraries are filled in with actual paths, not just library names which are then looked up by LD_LIBRARY_PATH or ld config. In most cases of system libraries, these are literally hard coded (/System/Framworks shows up inside .dylibs).

Also, dl doesn't cache libraries by name. Only by file. So, the idea of calling dlopen() to preload libjvm, then load libnet, won't result in libnet finding the preloaded libjvm. libnet's search logic will be different, and might not find a libjvm, or might find a different one. Once it's found a file, then that file is only loaded once.

So, this makes things hard. Let's think about IKVM.

We have libjvm, libnet, etc. libnet depends on libjvm. libnet and libjvm are both IKVM distributed libraries, so will live in the same directory as each other. So, libnet could have an entry for libjvm that is @loader_path/libjvm. Then loading libnet would find libjvm besides it and load that.

However, we also have to deal with user JNI libraries. That might live outside the JVM. For instance, lwjgl. These libraries will be embedded into JAR files. And they are what they are: we can't change them. And they all depend on libjvm.

Looking at traditional JVMs on OS X shows the LCID of @rpath/libjvm.so.

Which means all existing user distributed JNI libraries for OSX probably have @rpath/libjvm.so encoded into them. Taking a quick look around the field shows this to be true. I checked out LWJGL, and yes, that's what they have. When the built their images they linked against some version of libjvm.so, and that libjvm.so had @rpath/libjvm.so as the LCID, which got copied into their binary.

Many of those project (LWJGL for example), work by extracting their native dylibs out of a JAR file as a resource, into a temp directory. This temp directory does not contain libjvm.so. But, when Java loads their libraries, it is capable of resolving their @rpath/libjvm.so path to the libjvm.so provided by the JDK.

So what's going on here. When the dlopen call happens in response to System.loadLibrary("/tmp/path/to/liblwjgl.dylib"), the depends are resolved by the OS. The OS sees that liblwjgl.dylib links to @rpath/libjvm.dylib. At that moment, the rpath stack is such that libjvm.so is discoverable within it. The System.loadLibrary call is backed by libjvm.dylib itself, which had LC_RPATH entries of @loader_path/.. and @loader_path/.

So, libraries loaded by libjvm.dylib are capable of finding libjvm.dylib, and dependencies of libjvm.dylib because it contains LC_RPATH @loader_path/..

So here's the issue.

Right now we implement System.loadLibrary through C# code. We have no control over the relative location of dotnet.exe in relation to the IKVM native libraries. It might be dotnet.exe. It might be a AppHost for the platform. It might be simply loaded as a plugin from some other application. NativeLibrary has no methods for manipulating LC_RPATH (of course). Nor am I even sure there are such methods available (I did a quick search and couldn't find anything). NativeLibrary calls out to some PAL library provided by the runtime.

So this is a problem. If we use NativeLibrary.Load on a path to a user provided JNI library, at that moment, we're relying on the .NET process to have an rpath stack that contains an entry pointing to IKVM's libjvm.dylib. Which we can't do.

So, a solution: do what OpenJDK does. Make sure we make our dlopen() call out of a library that sets up the proper LC_PATH entries for finding IKVM's libjvm.dylib.

Which would be our libjvm.dylib. So, we can't load JNI libraries using NativeLibrary. We need to use our own library to call dlopen, one set up with the proper LC_PATHs. We can use NativeLibrary to bootstrap that library. But not anything else.

jkotas commented 10 months ago

I do not see a problem with your approach. The managed NativeLibrary API is not designed to support all possible flags and parameters of the OS-specific native library loading APIs. It is expected that one has to call the OS-specific native library loading APIs in scenarios that need the full control.

wasabii commented 10 months ago

Final design in here. This gets complicated.

There are two paths to loading native libraries in IKVM: 1) loading regular native libraries for use in .NET code 2) native libraries loaded for usage within JNI (through System.loadLibrary).

1) This is handed through the IKVM.Runtime.NativeLibrary class. On platforms that support it, this class uses System.Runtime.InteropServices.NativeLibrary. However, it does it's own library searching. First looking in standard locations first: path relative to the IKVM.Runtime assembly, and second, by scanning runtimes//native for supported RIDs. We reimplement the runtimes//native scanning for potential cases of running on .NET framework, but distributing multiple library versions. For instance, creating one publish output for net47 that runs on both Windows and Mono on Linux.

IKVM.Runtime.NativeLibrary might not have System.Runtime.InteropServices.NativeLibrary (Framework). In that case, it uses libikvm's IKVM_dl_open/dl_close/dl_sym functions. These functions are implemented in C, instead of PInvoking to dlopen/dlsym and dlclose, because the library that contains dlopen/dlsym/dlclose isn't fixed. Older versions of Linux may have it in libdl. Newer versions have it in libc. There used to be a compatibility library. Sometimes there isn't. So, P/Invoke can't always find dlopen/dlclose/dlsym.

However, IKVM.Runtime.NativeLibrary can't use libikvm without loading libikvm. Chicken meet egg. So, libikvm is has to be loaded differently. On Core, we can use System..NativeLibrary. On Framework on Windows we P/Invoke to LoadLibrary. And on Mono we can use DllImport with dllmaps.

So, loading libikvm itself goes through the most available .NET way of doing things, either S..NativeLibrary or P/Invoke. Two methods which don't have much flexibility, but that's okay, because libikvm has no dependencies. So that flexibility doesn't matter.

2 JNI. This is handled through JNI.JNINativeLoader, which invokes JVM_LoadLibrary and JVM_UnloadLibrary directly out of libjvm. This is so for this path we can rely on the OpenJDK C based logic.

New bootstrapping problem: calling JVM_LoadLibrary from JNINativeLoader (C#) requires us to load libjvm. This is handled by LibJVM.cs. In here, we use IKVM..NativeLibrary. But with a well known path off of IKVM_HOME: IKVMHOME/bin. And in this case, we cannot use DllImport, but have to obtain function pointers and marshal to delegates by hand, since our path is dynamic. Since we have a well-known absolute path, IKVM..NativeLibrary to load libjvm does not perform any searching logic, that logic is contained in LibJVM.cs. libjvm is loaded on demand (first requirement to load a JNI library).

Other native libraries will go through I..NativeLibrary, which MAY use S..NativeLibrary, or libikvm, depending on the requirements of the particular platform.

So, JNI stuff is loaded using JVM_LoadLibrary out of libjvm.

libjvm is itself loaded by IKVM.Runtime.NativeLibrary, but with a well known path.

IKVM.Runtime.NativeLibrary, on some platforms, requires libikvm.

libikvm is itself loaded by the lowest common denominator of System.Runtime.InteropServices.NativeLibrary, or calls to LoadLibrary, and on Mono we can use DllImport with a dllmap file.

Eventually a user's JNI library runs (loaded by JVM_LoadLibrary), and because of the @rpath stuff mentioned previously that resolves to the same libjvm as was preloaded by LibJVM.cs, preserving the static methods we use for callbacks for JVM_CreateJavaVM and those guys.

So the end result is two paths for loading. One path for loading plain .NET stuff, the other for loading JNI stuff. IKVM distributes the built-in JNI libraries outside of the normal .NET path locations: we put them in IKVM_HOME. They aren't loaded through .NET paths, so this is okay. And we already have a RID specific IKVM_HOME drop available. So moving them in there makes a lot of sense. The files get shared by IKVM.Image.JRE and IKVM.Image.JDK, with .runtime specific packages IKVM.Image.runtime.win7-x64, etc.

.NET-ish native libraries end up distributed normally, either copied to the user's bin directory, or runtimes/rid.

In the Java loading path, System.loadLibrary goes through two sets of paths: sun.boot.library.path and java.library.path. The first is meant for finding the built-in libraries. It's search path is setup to be IKVM_HOME/bin. java.library.path however is different per OS: inheriting the search logic of OpenJDK, with different paths for each OS. We add our own little magic into java.library.path though: we include the .NET search locations in ther. So, libraries located in .NET standard locations can also be loaded through JNI. This lets the user build .NET projects which use IKVM as a dependency, and package their native libraries in NuGet, but still access them through IKVM.

Lot of work.