Closed slang25 closed 2 years ago
The difference appears to be that with a debugger attached, transitive dependencies appear in project.ProjectReferences
, when the debugger isn't attached then they don't.
Still confusing, but I'm getting closer.
I've been doing a lot of debugging, I believe there is some sort of race condition inside MsBuildPipeLogger.Server
. I have run my project graph through Buildalyzer with a debugger attached and unattached, comparing some diagnostic outputs. I can confirm that under both scenarios:
BuildEventArgs
from msbuild onto disk, they both produce the same number of events, of the same type, in the same orderHowever when we get the events back in Buildalyzer, the count of e.Items
in ProjectStarted
vary between the two, when a debugger is not attached we get fewer results.
Ok, here is a repro: https://github.com/slang25/BuildalyzerBugRepro
It contains a sample app of 4 projects, and a console app which then uses buildalyzer to try to compile it.
When running the BugRepro app with a debugger attached, everything works and the output is:
Projects
βββ ConsoleApp1
βββ ClassLibrary1
β βββ ClassLibrary2
βββ ClassLibrary3
Error count: 0
ConsoleApp ProjectReferences:
/Users/stuart.lang/git/github/BuildalyzerBugRepro/SampleApp/ClassLibrary1/ClassLibrary1.csproj
/Users/stuart.lang/git/github/BuildalyzerBugRepro/SampleApp/ClassLibrary3/ClassLibrary3.csproj
/Users/stuart.lang/git/github/BuildalyzerBugRepro/SampleApp/ClassLibrary2/ClassLibrary2.csproj
Now running the same app without a debugger attached, and the output is now:
Projects
βββ ConsoleApp1
βββ ClassLibrary1
β βββ ClassLibrary2
βββ ClassLibrary3
Error count: 1
The type or namespace name 'ClassLibrary2' could not be found (are you missing a using directive or an assembly reference?)
ConsoleApp ProjectReferences:
/Users/stuart.lang/git/github/BuildalyzerBugRepro/SampleApp/ClassLibrary1/ClassLibrary1.csproj
/Users/stuart.lang/git/github/BuildalyzerBugRepro/SampleApp/ClassLibrary3/ClassLibrary3.csproj
I'm on macOS, and haven't tried this on Windows yet. I'd be interested to see if you can replicate this issue on your side. It is also important that the SampleApp folders have clean obj/bin folders to replicate this.
(the tree printed on the top of the output isn't from buildalyzer, but is just for illustrative purposes to convey the project structure)
Fantastic investigative work and generating a repro! I'll take it for a spin as soon as I can. This wouldn't be the first race condition in the logger interaction - there were some earlier bugs where it wouldn't work at all, or the messages available vs. done ordering got messed up and it would crash. I think we got most of those resolved, but maybe attaching a debugger skews the timing enough to expose that they're weren't really resolved in the first place and just masked.
I can repro on Windows, so that at least confirms it's not just on your machine, and it's a cross-platform bug
Interestingly I can't actually reproduce a successful build - even with a debugger it fails. That could be due to different amounts of overhead for Windows vs. Mac debugging though.
Okay, I can repro now if I set some breakpoints and give it some time inside the AddToWorkspace()
extension. I'm starting to wonder if this is some sort of timing issue with Roslyn vs. the builds. Still investigating.
I think I'm on to something. I'm still stumped why this ever works at all (I.e. with a debugger attached), but I think the underlying issue in this case is the way Roslyn treats transitive references. These issues all seem related:
These all come down to the way IAnalyzerResult.AddToWorkspace()
manages references for the projects being added. We compile the project being added (via Buildalyzer) and then recursively add project references to each Roslyn project. Right now the behavior mirrors MSBuild (since that's what Buildalyzer uses) and adds references in the Roslyn project only if the corresponding MSBuild project directly contains the reference. The issues linked above suggest that Roslyn compilations are "post-transitive resolution" meaning that they expect to be handed the full set of references, direct and transitive - and that resolving transitive references is out of scope for Roslyn itself.
One thing I noticed that got me on this track was that Buildalyzer seemed to have no problem building the projects itself:
So my thinking right now is that this is a bug in the Buildalyzer.Workspaces project and not the core of Buildalyzer or the MSBuild loggers. Perhaps the debugging differences there could be explained by MSBuild behaving differently when it detects a debugger (I.e. certain tasks doing different things). No idea, just a guess.
Still doesn't explain why it ever worked at all - I'm at a loss there.
As expected, fixing up the transitive references in the Roslyn workspace appears to have resolved this issue:
I'll get a new release published. That's not to say there also isn't a race condition in the logger - it's certainly been suspect in the past, but we'll need to come up with a different repro if there is :)
FYI - I've found some other gotchas in a couple places unrelated to this, so it might be another day or two before a new release goes out.
Thanks @daveaglick, getting my head around this now. What you are saying makes total sense. When I switch to Windows I cannot get my repro working at all, so yeah it must be something particular to my setup here on macOS.
The fix makes total sense, and looks a lot like how I've been working around the issue. I'll update here and take out my workaround, I'll report back soon. π€
The latest release has been working great π Many thanks @daveaglick
Yay!!
I am experiencing really strange behaviour. I have some code like this:
So, if I first go to my solution and build it (so that the bin and obj folders have dlls in), this above code works, and I can get the compilation for the 9 projects (in the graph of projects where someproject is the root).
If I clean the solution (like
git clean -xdf
) and run the same code with the debugger attached, then it works the same.However, if I run the same code without the debugger attached, then when I retrieve the compilations for the various projects, they are full of errors about missing types. This is really weird.
I can reproduce it here 100% of the time (I will work on a sharable repro). It's not a Debug vs Release thing, as it works in both configurations, but fails on both when a debugger is attached.
If I detach the debugger on the line with the comment
// This is where it gets interesting
then it fails, if I detach any time after then it succeeds π€―I'll work on a sharable repro, but I'm super confused about what is going on.