Open kkm000 opened 5 years ago
I get the message
Timestamp of the IL assembly does not match record in .aux file. Loading IL to compare signature.
when Device Guard is enabled which is configured to block certain dlls based on Code Integrity policies. Was this your issue? This seems to be still the case on Windows 11 and seems never to have worked on any Windows version despite MS claiming that the native images when Device Guard is enabled remain usable.
I've attempted to run a multi-node build of a solution (about 15 projects, mix of C# "classic" and F# "SDK" style, all library and desktop executables, with some dependencies on each other), and it failed quite miserably. Essentially, I am modifying the sample's Bulder class. The resolution of DLLS in the controlling process happens just fine, but then it launches a load of nodes (500+ node processes!) it cannot communitcate with. Debugging it (and I even pulled the whole MSBuild and stepped through some code), I cannot understand how that could be possible!
This is Windows 10, net46 build target.
Now my Build method looks like this:
(I noticed XBuild.cs sets up a distributed logger; I tried disconnecting the logger at all, but nothing changed).
Package references
Printed version and locations from
typeof(Project)
inBuild..ctor()
-- I moved it there:What happens is the build progresses very slowly, and the same project is the solution is reported to be build multiple times, according to the log, with the modified logger from the sample, going like
Then, after ~55s, with the whole machine slowed down to a crawl (and it's a decent 8 Core-X CPU, 64GB workstation), the build ends up in an exception:
That's a good hint, actually! When I examined the node processes with the Process Explorer, I learned that the nodes load the 4.0 ngen'd DLLs from the GAC:
while my main process is resolving the correct DLLs. I have only one MSBuild installation, one from the VS15.9.2 payload; it is discovered correctly and all assemblies load into my process from it:
Every node process has the command line
After the build ends, there are 580 nodes, each taking ~2MB or RAM (as opposed to 20-30 when they actually do something)
After some digging through MSBuild source, and figuring out how the nodes are started, I am seriously baffled. I've enable the comm logging
and got a directory of 581 files, one of these not like the others, corresponding to my process PID. The node-produced logs all indeed are waiting for communication, rejecting a request from the wrong M.Build.dll
and the driver launches nodes and is unable to communicate to them
This repeats over and over. Maybe the only notable change is the TID changes from time to time (4 different TIDs are logging, one strictly after the other).
Looking at the process launch code, I am at a loss. The processes are launched with the off-the-shelf native Win32
::CreateProcess()
call. MSBuild.exe.config is perfectly normal, andmsbuld
works from the dev console just fine. How is it possible MSBuild.exe nodes do not see their manifest? I cannot explain, something very tricky is going on!The only thing notable about my setup is I have a directory junction, among others,
C:\_PX\
->C:\Program Files (x86)
; I came to use that setup since Vista times, that (and Windows 7 too) ran out of env space with long paths. For the same reason, I shortenMicrosoft Visual Studio
toMSVS
when first set it up. This is why the paths may vary. (Aside, cmd's env space does not appear to be a problem in Windows 10 any more, but... just in case). This set up has never caused me a problem with any program whatsoever, VS with additional (Intel) C++ compilers, msbuild or multiple other packages, MATLAB, Mathematica, a few Pythons and data sci packages, you name it. Otherwise, everything is perfectly non-hacksy.I do not have the
M.Build*.dll
s in my bin folderI have no unexpected msbuild.exe on PATH
the build-generated
msbuildIntegrated.exe.config
looks normal to meand I swear I did not touch msbuild.exe.config, which has redirects to the same 15.1.0.0 for all DLLs. Besides, it works from command line, or when I launch the process from command line, or from my build tool (that I wanted to replace with the Locator and API calls, but oops).
If there is anything I can do more to trace the issue, please let me know. I am entirely out of ideas. The Locator's handler, even if active, should not affect the nodes launched with a native Win32 API in any way, I believe. So the longer I am looking at the whole issue, the more "it can't happen!" it looks to me. Help!
Adding more info. I captured the Fusion logs, and they are... interesting. Here is MsBuild.exe binding M.Build.dll.
Note added after adding more info: This may or may not be a red herring, as I ran another, normal command line build with the same arguments, carefully wiping all fusion logs and killing nodes between tests and the log entries look exactly same. I think I should stop experimenting with it, or my brain would boil and evaporate. So msbuild.exe 15.0 loads msbuild.ni.exe 4.0 and Microsoft.Build.ni.dll 4.0 from the GAC as its business as usual. Scares last remaining bejeebers out of me.
The "Default" log looks quite normal, and consists all of the same short repeating entry
Native image log entries are strange, and I cannot make sense of them. The same group of 4 entries repeats over and over, and in the end it looks like it happily binds to the GAC version of the 4.0.0.0 assembly.
Baffling. I have not a trace of an idea what to make of all this.
Ok, the last one last enigma. Honestly. I am heading toward a cranium BLEVE. Here's what the nodes invoked by the normal command-line build load (if I am to trust Process Explorer, which I am probably do not any more). So they all load the M.Build.ni.dll 4.0 from the GAC.
If this is true, then all my above analysis is probably worthless. I just do not understand how computers work.