phmonte / Buildalyzer

A utility to perform design-time builds of .NET projects without having to think too hard about it.
MIT License
611 stars 94 forks source link

Number of syntax trees in project sometimes zero #138

Open duncanawoods opened 4 years ago

duncanawoods commented 4 years ago

I am often and unpredictably finding no syntax trees in a project. The solution has 10 projects and on around half the executions, 2 or 3 of the projects (different each time) will return no classes.

public static async Task<string> ParseInfo(string solutionFile)
{
    var manager = new AnalyzerManager(solutionFile);
    var workspace = manager.GetWorkspace();

    foreach (var project in workspace
        .CurrentSolution
        .Projects)
    {
        var compilation = await project.GetCompilationAsync();
        var count = 0;

        foreach (var tree in compilation.SyntaxTrees)
        {
            count +=
                tree.GetRoot()
                    .DescendantNodes()
                    .OfType<ClassDeclarationSyntax>()
                    .Count();
        }

        Console.WriteLine($"Project: {project.Name} Class Count: {count}");
    }
}

An example sequence of analyses might find this many total classes depending on which projects failed to produce any syntax trees:

Run Class count
0 854
1 854
2 854
3 854
4 727
5 831
6 693
7 854
8 854
9 853
10 840
11 854
12 854
13 854
14 854
15 831
16 854
17 854
18 854
19 854
duncanawoods commented 4 years ago

Ok looks like the same thread safety issue as #134. It's fixed by disabling parallel builds.

I have identified one issue so far. AnalyzerResults is updated from multiple threads and has no concurrency support. I have updated it from Dictionary to ConcurrentDictionary but this has not solved all issues.

duncanawoods commented 4 years ago

@daveaglick

The cause is that AnalyzerResult.ProcessCscCommandLine is never called because EventProcessor.MessageRaised is never called.

I believe there is a race condition between the AnonymousPipeLoggerServer reading from the buffer and it's CancellationTokenSource being triggered when the build process exits. I think that the buffer reading is cancelled before it has completed.

It appears that it isn't actually necessary to trigger CancellationTokenSource at all. The buffer reading will terminate on it's own when the pipe is closed so the Process.Exited event can just be removed.

private IAnalyzerResults BuildTargets(BuildEnvironment buildEnvironment, string targetFramework, string[] targetsToBuild, AnalyzerResults results)
{
    using (CancellationTokenSource cancellation = new CancellationTokenSource())
    {
        using (AnonymousPipeLoggerServer pipeLogger = new AnonymousPipeLoggerServer(cancellation.Token))
        {
            using (EventProcessor eventProcessor = new EventProcessor(Manager, this, BuildLoggers, pipeLogger, results != null))
            {
                // Run MSBuild
                int exitCode;
                string fileName = GetCommand(buildEnvironment, targetFramework, targetsToBuild, pipeLogger.GetClientHandle(), out string arguments);
                using (ProcessRunner processRunner = new ProcessRunner(fileName, arguments, Path.GetDirectoryName(ProjectFile.Path), GetEffectiveEnvironmentVariables(buildEnvironment), Manager.LoggerFactory))
                {
                    processRunner.Exited = () => cancellation.Cancel(); // <--- RACE CONDITION
                    processRunner.Start();
                    pipeLogger.ReadAll();                               // <----
                    processRunner.WaitForExit();
                    exitCode = processRunner.ExitCode;
                }

                // Collect the results
                results?.Add(eventProcessor.Results, exitCode == 0 && eventProcessor.OverallSuccess);
            }
        }
    }
    return results;
} 

When I comment out the cancellation, I now see reliable results.

daveaglick commented 4 years ago

When I comment out the cancellation, I now see reliable results.

Awesome! Thanks for tracking this down to a root cause - when I initially tried to reproduce I couldn't seem to get the same results so I put it on hold while working on other stuff and intended to come back later. Glad you beat me to it :)

I'll try my best to apply this as a fix and get out a new version this weekend or early next week.

duncanawoods commented 4 years ago

Thanks @daveaglick

i) also see #141 for a fix for that one.

ii) Changing AnalyzerResults to use a ConcurrentDictionary might fix the missing projects seen in #134.

I initially tried to reproduce I couldn't seem to get the same results

iii) To help replicate this, I generated a liltte script to create big solutions with lots of tiny projects and I was able to see lots of failures. I only need a dozen or so projects:

var sln = $"bigtest";

Console.WriteLine($"dotnet new sln -n {sln}");

for (int i = 0; i < 50; i++)
{    
    Console.WriteLine($"dotnet new console -n Proj{i}");
    Console.WriteLine($"dotnet sln ./{sln}.sln add ./Proj{i}/Proj{i}.csproj");
}
daveaglick commented 4 years ago

I'm going to add a test solution using your script to the test suite and it will be awesome :)

daveaglick commented 4 years ago

This test is now consistently passing for me with a 50 project solution: image

Hopefully that's an indication this is resolved. Thanks again for the fantastic work narrowing down the multiple concurrency issues here @duncanawoods.

daveaglick commented 4 years ago

Buildalyzer 3.0.1 is now rolling out to NuGet and should (hopefully) resolve this issue. When you get a chance can you please verify that this problem is resolved? Thanks!

duncanawoods commented 4 years ago

Hi @daveaglick

I have had a chance to try it out. When noodling around, I was seeing it around 5% of the time. I have also seen some cases where a typically 15s operation took 10 minutes to complete.

I created a batch test with better data collection to share some hard stats with you but this didn't even appear (!) and I saw some different errors.

Test setup

Result Number %
Success 385 96.25%
Could not find build environment 15 3.75%
Missing classes 0 0

Observations

Conclusions

I think that while we have improved things, there are still gremlins in the concurrency code and given their rarity we are going to have a terrible time replicating let alone fixing them.

In this sort of situation my approach would be to take a clean-slate approach the concurrency design. Right now pretty much every form of parallelism is used: sub-processes, threads, async/await, TPL loops and concurrent data-structures. One concern is that peppering code with concurrent data-structures is insufficient for operations that touch multiple data-structures. What needs to happen is ensure that entire operations are isolated from each other instead of just fine-grained access to specific data. To make concurrency comprehensible and checkable for errors, I would aim to have the majority of code unaware of concurrency concerns and push all the orchestration up to the top level with crystal clear guarantees about who is accessing what, when things are started/terminated.