Blocking issues in .NET

lilith commented 9 years ago

In order to write software that targets both .NET 4.6 and .NET Core, we need .NET 4.6 to fix several oversights.

Visual studio, ASP.NET, deploy tooling, test tooling, and even the .NET runtime make some bad assumptions:

Every .dll is a .NET dll.
Managed .dlls do not depend on native .dlls
No project ever needs to target or deploy to more than one architecture/platform.
Nobody needs to adjust native or managed dll search paths at runtime.
Specifically, what low-level APIs do we need before we can fix dependency loading ourselves?

1) The ability to capture and override the load path of every .NET assembly, not just those which are missing (which AssemblyResolve provides). This will allow us to avoid BadImageFormatExceptions and locate compatible binaries instead to bring into the default load context.

2) The ability to either

A) Represent all versions of unmanaged and non-Any CPU dependencies within the metadata of a managed assembly, so that tooling and hosts can correctly deploy and shadow-copy them.
B) Use a standard assembly attribute to define an entry-point which will called for every movement (such as test runners, deployment, or ASP.NET dynamic compilation), which can be responsible for the process.
C) Expose a comprehensive list of still-accessible locations where we left the original binaries behind. (In many contexts, neither Location or Codebase are sufficient for discovery, such as for shadow-copying or nuget references). This does not handle some scenarios, like deployment.

We need pervasive architecture awareness, or the ability to create it.

3) A (respected) way to tell hosts (like ASP.NET/IIS) that the world will end if they use more than 1 AppDomain per process. Many native interop scenarios can't handle AppDomains, period.

4) A way to do #1 in the context of an ASP.NET app . ASP.NET currently loads everything in the /bin folder, then calls PreAppStart on each one. We can't assume everything in /bin root will load successfully, since we still have tooling that is impeded when it comes to targets and managed/native correlation. Hacking this with <remove assembly="*"/> isn't well-received; a fully qualified assembly reference to preload would be better.

5) The ability to control the assembly probing path at runtime - so that we can support architecture-specific bin subfolders - within the default load context. We should even be able to exclude ApplicationBase, so that we can cleanly hand control over to an external assembly for dependency resolution. This is somewhat a duplicate of #1, but with better performance for simple scenarios.

Much of the ecosystem (read: nuget packages) around .NET is reliant upon Windows APIs. This may may come in the form of System.Drawing, System.Media, or P/Invoke calls, but it is prevalent. Moving to .NET Core and vNext will entail the creation or porting of dozens of native libraries with corresponding interop layers. The current interop pain is too great for most OSS maintainers to overcome simply for the sake of .NET Core compatibility, so if we're to have to the future we all want (.NET Core), we need to fix it.

davidfowl commented 9 years ago

I'm not sure if the feature will ever make it into .NET but the CoreCLR has a feature call an AssemblyLoadContext, that gives control over all loading (native and managed). This is what we hook in the dnx to get control over what to load at runtime based on the current context.

The dnx has knowledge of the NuGet closure (via the lock file) so it can reason about probing paths and setup the runtime in appropriate way so that DllImport just works (in most cases).

The dnx also works on full CLR and it's more challenging there because of what you mention. The only ways around it would be to use SetDllDirectory or by changing the process PATH. Another possible solution would be to wait until an assembly is loaded and they force load all native dependencies (based on whatever NuGet package semantics we come up with).

lilith commented 9 years ago

I'll have to give AssemblyLoadContext a try. I really hope it makes it into .NET; we need it badly.

According to docs, SetDllDirectory still doesn't override the app directory... Messing up PATH seems like it could have side effects, but options are quite limited.

One problem with waiting until a managed dependency to load is that it breaks C++/CLI, which embeds 1 and only 1 architecture variant inside the managed DLL. Or you mean selecting the right bitness of managed dlls first, then once those are loaded, inspect for native dependencies? ECMA CLI defines an assembly load entry point; the main use for this seems to be to handle native resource init/shutdown. We would break that use case, but it's admittedly rare.

(2) I'm not sure whether a NuGet closure file or new metadata is best for dealing with tooling, like test runners. I run my tests in both 32 and 64-bit mode on the same machine, from the same filesystem. @bradwilson, do you have an opinion on this?

Did I overlook something easy for (3)?

davidfowl commented 9 years ago

One problem with waiting until a managed dependency to load is that it breaks C++/CLI, which embeds 1 and only 1 architecture variant inside the managed DLL. Or you mean selecting the right bitness of managed dlls first, then once those are loaded, inspect for native dependencies? ECMA CLI defines an assembly load entry point; the main use for this seems to be to handle native resource init/shutdown. We would break that use case, but it's admittedly rare.

That's ok because by the time you're running/loading, you should have all of the context required to pick the right binary.

(2) I'm not sure whether a NuGet closure file or new metadata is best for dealing with tooling, like test runners. I run my tests in both 32 and 64-bit mode on the same machine, from the same filesystem.

I don't see why test runners or tooling are special here. Can you enlighten me?

lilith commented 9 years ago

I suppose it depends on how the (shadow) copying is implemented. xunit appears to use AppDomainSetup and CreateDomain, so if CreateDomain implements all of the logic required, then we're good.

Keep in mind though, that native dependencies have different file locking behavior when compared to their managed counterparts. If any files need to be (shadow)copied prior to execution, it's the native DLLs. Test runner host processes are often re-used. TestDriven.NET, for example, must be manually killed before every rebuild, as native dll file locks aren't released with the appdomain.

davidfowl commented 9 years ago

Test runners are no different than any other piece of runtime code.

lilith commented 9 years ago

Any code that is responsible for executing a .NET assembly - but which needs to permit all involved DLL files to be overwritten at any time - has specific responsibilities. One could argue that this is the responsibility of the build tools and/or deploy tools, but I would disagree (and good luck convincing the VS team of that).

Some deploy tools try to work around this by taking a set of actions when encountering a locked file:

If the temp folder exists in the same filesystem as the deploy folder, they move the in-use file to a temp folder that is marked for deletion at system boot.
If not, they move it to a custom folder and use a custom cleanup approach. ... and many other techniques; Garrett Serack has a pretty exhaustive list.

However, Visual Studio integrates with arbitrary build tools, and I don't think it is even possible to get them all to implement the kind of hacks required to work around this from the file writer's side.

From the execution side, in any test runner scenario, we want to prevent the files from being locked. To do that, we have to (shadow) copy all the native dlls, managed dlls, managed resources, and native resources - to another location before running them. Are you saying that the .NET framework should be responsible for this process, as part of the AppDomain API?

bradwilson commented 9 years ago

I don't think it's the responsibility of the VS team. I think it's the responsibility of the .NET team.

The Shadow Copy feature belongs to them. That it is half-implemented for native DLL users is really their issue. Pushing it to all the test runners just begs for a half dozen incompatible (and maybe incorrect) implementations.

lilith commented 9 years ago

@davidfowl, has anything changed with .NET 4.6 that would unblock sub-issue 1 or 2 above?

imazen / Imazen.NativeDependencyManager

Blocking issues in .NET #5

Specifically, what low-level APIs do we need before we can fix dependency loading ourselves?