phmonte / Buildalyzer

A utility to perform design-time builds of .NET projects without having to think too hard about it.
MIT License
589 stars 92 forks source link

Compilation Errors When Debugger Not Attached #181

Closed slang25 closed 2 years ago

slang25 commented 2 years ago

I am experiencing really strange behaviour. I have some code like this:

var projectAnalyzer = analyzerManager.GetProject("someproject.csproj");
var analyzerResults = projectAnalyzer.Build(GetEnvironmentOptions());
analyzerResults.First().AddToWorkspace(workspace, true); // This is where it gets interesting

var solution = workspace.CurrentSolution;
await solution.Projects.First().GetCompilationAsync();

So, if I first go to my solution and build it (so that the bin and obj folders have dlls in), this above code works, and I can get the compilation for the 9 projects (in the graph of projects where someproject is the root).

If I clean the solution (like git clean -xdf) and run the same code with the debugger attached, then it works the same.

However, if I run the same code without the debugger attached, then when I retrieve the compilations for the various projects, they are full of errors about missing types. This is really weird.

I can reproduce it here 100% of the time (I will work on a sharable repro). It's not a Debug vs Release thing, as it works in both configurations, but fails on both when a debugger is attached.

If I detach the debugger on the line with the comment // This is where it gets interesting then it fails, if I detach any time after then it succeeds 🀯

I'll work on a sharable repro, but I'm super confused about what is going on.

slang25 commented 2 years ago

The difference appears to be that with a debugger attached, transitive dependencies appear in project.ProjectReferences, when the debugger isn't attached then they don't.

Still confusing, but I'm getting closer.

slang25 commented 2 years ago

I've been doing a lot of debugging, I believe there is some sort of race condition inside MsBuildPipeLogger.Server. I have run my project graph through Buildalyzer with a debugger attached and unattached, comparing some diagnostic outputs. I can confirm that under both scenarios:

However when we get the events back in Buildalyzer, the count of e.Items in ProjectStarted vary between the two, when a debugger is not attached we get fewer results.

slang25 commented 2 years ago

Ok, here is a repro: https://github.com/slang25/BuildalyzerBugRepro

It contains a sample app of 4 projects, and a console app which then uses buildalyzer to try to compile it.

When running the BugRepro app with a debugger attached, everything works and the output is:

Projects
└── ConsoleApp1
    β”œβ”€β”€ ClassLibrary1
    β”‚   └── ClassLibrary2
    └── ClassLibrary3
Error count: 0
ConsoleApp ProjectReferences:
/Users/stuart.lang/git/github/BuildalyzerBugRepro/SampleApp/ClassLibrary1/ClassLibrary1.csproj
/Users/stuart.lang/git/github/BuildalyzerBugRepro/SampleApp/ClassLibrary3/ClassLibrary3.csproj
/Users/stuart.lang/git/github/BuildalyzerBugRepro/SampleApp/ClassLibrary2/ClassLibrary2.csproj

Now running the same app without a debugger attached, and the output is now:

Projects
└── ConsoleApp1
    β”œβ”€β”€ ClassLibrary1
    β”‚   └── ClassLibrary2
    └── ClassLibrary3
Error count: 1
The type or namespace name 'ClassLibrary2' could not be found (are you missing a using directive or an assembly reference?)
ConsoleApp ProjectReferences:
/Users/stuart.lang/git/github/BuildalyzerBugRepro/SampleApp/ClassLibrary1/ClassLibrary1.csproj
/Users/stuart.lang/git/github/BuildalyzerBugRepro/SampleApp/ClassLibrary3/ClassLibrary3.csproj

I'm on macOS, and haven't tried this on Windows yet. I'd be interested to see if you can replicate this issue on your side. It is also important that the SampleApp folders have clean obj/bin folders to replicate this.

slang25 commented 2 years ago

(the tree printed on the top of the output isn't from buildalyzer, but is just for illustrative purposes to convey the project structure)

daveaglick commented 2 years ago

Fantastic investigative work and generating a repro! I'll take it for a spin as soon as I can. This wouldn't be the first race condition in the logger interaction - there were some earlier bugs where it wouldn't work at all, or the messages available vs. done ordering got messed up and it would crash. I think we got most of those resolved, but maybe attaching a debugger skews the timing enough to expose that they're weren't really resolved in the first place and just masked.

daveaglick commented 2 years ago

I can repro on Windows, so that at least confirms it's not just on your machine, and it's a cross-platform bug

image

daveaglick commented 2 years ago

Interestingly I can't actually reproduce a successful build - even with a debugger it fails. That could be due to different amounts of overhead for Windows vs. Mac debugging though.

daveaglick commented 2 years ago

Okay, I can repro now if I set some breakpoints and give it some time inside the AddToWorkspace() extension. I'm starting to wonder if this is some sort of timing issue with Roslyn vs. the builds. Still investigating.

daveaglick commented 2 years ago

I think I'm on to something. I'm still stumped why this ever works at all (I.e. with a debugger attached), but I think the underlying issue in this case is the way Roslyn treats transitive references. These issues all seem related:

These all come down to the way IAnalyzerResult.AddToWorkspace() manages references for the projects being added. We compile the project being added (via Buildalyzer) and then recursively add project references to each Roslyn project. Right now the behavior mirrors MSBuild (since that's what Buildalyzer uses) and adds references in the Roslyn project only if the corresponding MSBuild project directly contains the reference. The issues linked above suggest that Roslyn compilations are "post-transitive resolution" meaning that they expect to be handed the full set of references, direct and transitive - and that resolving transitive references is out of scope for Roslyn itself.

One thing I noticed that got me on this track was that Buildalyzer seemed to have no problem building the projects itself:

image

So my thinking right now is that this is a bug in the Buildalyzer.Workspaces project and not the core of Buildalyzer or the MSBuild loggers. Perhaps the debugging differences there could be explained by MSBuild behaving differently when it detects a debugger (I.e. certain tasks doing different things). No idea, just a guess.

Still doesn't explain why it ever worked at all - I'm at a loss there.

daveaglick commented 2 years ago

As expected, fixing up the transitive references in the Roslyn workspace appears to have resolved this issue:

image

image

I'll get a new release published. That's not to say there also isn't a race condition in the logger - it's certainly been suspect in the past, but we'll need to come up with a different repro if there is :)

daveaglick commented 2 years ago

FYI - I've found some other gotchas in a couple places unrelated to this, so it might be another day or two before a new release goes out.

slang25 commented 2 years ago

Thanks @daveaglick, getting my head around this now. What you are saying makes total sense. When I switch to Windows I cannot get my repro working at all, so yeah it must be something particular to my setup here on macOS.

The fix makes total sense, and looks a lot like how I've been working around the issue. I'll update here and take out my workaround, I'll report back soon. 🀞

slang25 commented 2 years ago

The latest release has been working great πŸ˜„ Many thanks @daveaglick

daveaglick commented 2 years ago

Yay!!