Open duncanawoods opened 4 years ago
Ok looks like the same thread safety issue as #134. It's fixed by disabling parallel builds.
I have identified one issue so far. AnalyzerResults
is updated from multiple threads and has no concurrency support. I have updated it from Dictionary
to ConcurrentDictionary
but this has not solved all issues.
@daveaglick
The cause is that AnalyzerResult.ProcessCscCommandLine
is never called because EventProcessor.MessageRaised
is never called.
I believe there is a race condition between the AnonymousPipeLoggerServer
reading from the buffer and it's CancellationTokenSource
being triggered when the build process exits. I think that the buffer reading is cancelled before it has completed.
It appears that it isn't actually necessary to trigger CancellationTokenSource
at all. The buffer reading will terminate on it's own when the pipe is closed so the Process.Exited
event can just be removed.
private IAnalyzerResults BuildTargets(BuildEnvironment buildEnvironment, string targetFramework, string[] targetsToBuild, AnalyzerResults results)
{
using (CancellationTokenSource cancellation = new CancellationTokenSource())
{
using (AnonymousPipeLoggerServer pipeLogger = new AnonymousPipeLoggerServer(cancellation.Token))
{
using (EventProcessor eventProcessor = new EventProcessor(Manager, this, BuildLoggers, pipeLogger, results != null))
{
// Run MSBuild
int exitCode;
string fileName = GetCommand(buildEnvironment, targetFramework, targetsToBuild, pipeLogger.GetClientHandle(), out string arguments);
using (ProcessRunner processRunner = new ProcessRunner(fileName, arguments, Path.GetDirectoryName(ProjectFile.Path), GetEffectiveEnvironmentVariables(buildEnvironment), Manager.LoggerFactory))
{
processRunner.Exited = () => cancellation.Cancel(); // <--- RACE CONDITION
processRunner.Start();
pipeLogger.ReadAll(); // <----
processRunner.WaitForExit();
exitCode = processRunner.ExitCode;
}
// Collect the results
results?.Add(eventProcessor.Results, exitCode == 0 && eventProcessor.OverallSuccess);
}
}
}
return results;
}
When I comment out the cancellation, I now see reliable results.
When I comment out the cancellation, I now see reliable results.
Awesome! Thanks for tracking this down to a root cause - when I initially tried to reproduce I couldn't seem to get the same results so I put it on hold while working on other stuff and intended to come back later. Glad you beat me to it :)
I'll try my best to apply this as a fix and get out a new version this weekend or early next week.
Thanks @daveaglick
i) also see #141 for a fix for that one.
ii) Changing AnalyzerResults
to use a ConcurrentDictionary
might fix the missing projects seen in #134.
I initially tried to reproduce I couldn't seem to get the same results
iii) To help replicate this, I generated a liltte script to create big solutions with lots of tiny projects and I was able to see lots of failures. I only need a dozen or so projects:
var sln = $"bigtest";
Console.WriteLine($"dotnet new sln -n {sln}");
for (int i = 0; i < 50; i++)
{
Console.WriteLine($"dotnet new console -n Proj{i}");
Console.WriteLine($"dotnet sln ./{sln}.sln add ./Proj{i}/Proj{i}.csproj");
}
I'm going to add a test solution using your script to the test suite and it will be awesome :)
This test is now consistently passing for me with a 50 project solution:
Hopefully that's an indication this is resolved. Thanks again for the fantastic work narrowing down the multiple concurrency issues here @duncanawoods.
Buildalyzer 3.0.1 is now rolling out to NuGet and should (hopefully) resolve this issue. When you get a chance can you please verify that this problem is resolved? Thanks!
Hi @daveaglick
I have had a chance to try it out. When noodling around, I was seeing it around 5% of the time. I have also seen some cases where a typically 15s operation took 10 minutes to complete.
I created a batch test with better data collection to share some hard stats with you but this didn't even appear (!) and I saw some different errors.
Result | Number | % |
---|---|---|
Success | 385 | 96.25% |
Could not find build environment | 15 | 3.75% |
Missing classes | 0 | 0 |
I think that while we have improved things, there are still gremlins in the concurrency code and given their rarity we are going to have a terrible time replicating let alone fixing them.
In this sort of situation my approach would be to take a clean-slate approach the concurrency design. Right now pretty much every form of parallelism is used: sub-processes, threads, async/await, TPL loops and concurrent data-structures. One concern is that peppering code with concurrent data-structures is insufficient for operations that touch multiple data-structures. What needs to happen is ensure that entire operations are isolated from each other instead of just fine-grained access to specific data. To make concurrency comprehensible and checkable for errors, I would aim to have the majority of code unaware of concurrency concerns and push all the orchestration up to the top level with crystal clear guarantees about who is accessing what, when things are started/terminated.
I am often and unpredictably finding no syntax trees in a project. The solution has 10 projects and on around half the executions, 2 or 3 of the projects (different each time) will return no classes.
An example sequence of analyses might find this many total classes depending on which projects failed to produce any syntax trees: