reaqtive / reaqtor

Reaqtor is a framework for reliable, stateful, distributed, and scalable event processing based on Rx.
https://reaqtive.net/
MIT License
627 stars 40 forks source link

Nuqleon.Linq.Expressions.Optimizers causes IDE0001 diagnostic to run incredibly slowly #138

Open idg10 opened 1 year ago

idg10 commented 1 year ago

Since switching to the .NET 6.0 version of dotnet format, ran into problems with the IDE0001 analyzer running extremely slowly (causing a build to time out) in these projects:

We determined that IDE0001 was the one taking the time by attaching a debugger and seeing what it was doing. In files like PureMemberCatalog.cs it was taking half a second or more to process each line of this form:

(Double value) => global::System.Convert.ToInt32(value)

The cumulative effect of this was to cause the dotnet format check to take well over an hour to complete, causing it to fail due to timeout on the build server.

We tried suppressing the relevant rule but this had no effect on analysis time for these projects. Perhaps IDE0001 isn't the only analyzer having a problem with this project. Or perhaps it still does the analysis and the settings would just suppress any warnings (and there are no diagnostics reported in this case).

We attempted to disable analysis completely for these projects but that also didn't help, for reasons we don't understand. (Possibly dotnet format just runs it regardless.)

We realised that the slow performance was caused by the fact that the PureMemberCatalog class has a nested System class, and large numbers of lines of code like the one shown above, i.e.:

(Double value) => global::System.Convert.ToInt32(value)

We think it is something to do with the combination of:

Although it might be only that last part, because experiments to move it all out into a separate namespace made no difference. It might just be that collection initializers with very large numbers of lambdas being processed as expression trees hit this problem.

This means we really have no option but to disable this build step completely.

That's highly undesirable, so we'd like to reinstate it at some point, but this means understanding why the analyzers have such problems with this code. Ideally the analyzer performance would be fixed...but if we could change something that will work around it, that'd also be an option.

We should start by following the suggestions at https://www.meziantou.net/understanding-the-impact-of-roslyn-analyzers-on-the-build-time.htm and it is probably also worth attaching a profiler to the relevant dotnet format command. We can run it from the command line thus:

dotnet format -v diag --no-restore All.sln
idg10 commented 1 year ago

I've done a bit of experimentation to try to isolate the problem. I've been able to reproduce the issue with a project with just a couple of files. If we create a new C# console app, and add a reference to the Nuqleon.Linq.Expressions.Optimizers NuGet package:

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net7.0</TargetFramework>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="Nuqleon.Linq.Expressions.Optimizers" Version="1.0.0-beta.24" />
  </ItemGroup>

</Project>

and then add this file:

using System.Linq.Expressions;

namespace TryToMakeIde0001GoSlowly;

internal class LotsOfExpressionTreesInInitializers
{
    public static MemberTable DateTime { get; } = new MemberTable
    {
        (long ticks) => new global::System.DateTime(ticks),
        (long ticks, DateTimeKind kind) => new global::System.DateTime(ticks, kind),
        (int year, int month, int day) => new global::System.DateTime(year, month, day),
        (int year, int month, int day, int hour, int minute, int second) => new global::System.DateTime(year, month, day, hour, minute, second),
        (int year, int month, int day, int hour, int minute, int second, DateTimeKind kind) => new global::System.DateTime(year, month, day, hour, minute, second, kind),
        (int year, int month, int day, int hour, int minute, int second, int millisecond) => new global::System.DateTime(year, month, day, hour, minute, second, millisecond),
        (int year, int month, int day, int hour, int minute, int second, int millisecond, DateTimeKind kind) => new global::System.DateTime(year, month, day, hour, minute, second, millisecond, kind),

        () => global::System.DateTime.MaxValue,
        () => global::System.DateTime.MinValue,

        (global::System.DateTime dt) => dt.Date,
        (global::System.DateTime dt) => dt.Day,
        (global::System.DateTime dt) => dt.DayOfWeek,
        (global::System.DateTime dt) => dt.DayOfYear,
        (global::System.DateTime dt) => dt.Hour,
        (global::System.DateTime dt) => dt.Kind,
        (global::System.DateTime dt) => dt.Millisecond,
        (global::System.DateTime dt) => dt.Minute,
        (global::System.DateTime dt) => dt.Month,
        (global::System.DateTime dt) => dt.Second,
        (global::System.DateTime dt) => dt.Ticks,
        (global::System.DateTime dt) => dt.TimeOfDay,
        (global::System.DateTime dt) => dt.Year,

        (global::System.DateTime t1, global::System.DateTime t2) => global::System.DateTime.Compare(t1, t2),
        (global::System.DateTime t1, global::System.DateTime t2) => global::System.DateTime.Equals(t1, t2),

        (double d) => global::System.DateTime.FromOADate(d),
        (long fileTime) => global::System.DateTime.FromFileTimeUtc(fileTime),
        (int year) => global::System.DateTime.IsLeapYear(year),
        (int year, int month) => global::System.DateTime.DaysInMonth(year, month),
        (global::System.DateTime value, global::System.DateTimeKind kind) => global::System.DateTime.SpecifyKind(value, kind),

        (global::System.DateTime dt) => dt.ToOADate(),

        (global::System.DateTime dt, global::System.TimeSpan value) => dt.Add(value),
        (global::System.DateTime dt, double value) => dt.AddDays(value),
        (global::System.DateTime dt, double value) => dt.AddHours(value),
        (global::System.DateTime dt, double value) => dt.AddMilliseconds(value),
        (global::System.DateTime dt, double value) => dt.AddMinutes(value),
        (global::System.DateTime dt, int months) => dt.AddMonths(months),
        (global::System.DateTime dt, double value) => dt.AddSeconds(value),
        (global::System.DateTime dt, long value) => dt.AddTicks(value),
        (global::System.DateTime dt, int value) => dt.AddYears(value),
        (global::System.DateTime dt, global::System.DateTime value) => dt.Subtract(value),
        (global::System.DateTime dt, global::System.TimeSpan value) => dt.Subtract(value),

        (global::System.DateTime dt, global::System.DateTime value) => dt.CompareTo(value),
        (global::System.DateTime dt, global::System.DateTime value) => dt.Equals(value),

        (global::System.DateTime dt, global::System.TimeSpan ts) => dt + ts,
        (global::System.DateTime dt, global::System.TimeSpan ts) => dt - ts,
        (global::System.DateTime dt1, global::System.DateTime dt2) => dt1 - dt2,
        (global::System.DateTime dt1, global::System.DateTime dt2) => dt1 == dt2,
        (global::System.DateTime dt1, global::System.DateTime dt2) => dt1 != dt2,
        (global::System.DateTime dt1, global::System.DateTime dt2) => dt1 < dt2,
        (global::System.DateTime dt1, global::System.DateTime dt2) => dt1 <= dt2,
        (global::System.DateTime dt1, global::System.DateTime dt2) => dt1 > dt2,
        (global::System.DateTime dt1, global::System.DateTime dt2) => dt1 >= dt2,
    }.ToReadOnly();
}

it takes about 1 minute and 30 seconds before it greys out all those global::System prefixes to tell you you can simplify them, and then another 10 seconds before the indicator at Visual Studio's bottom right indicates that all background tasks are complete.

If you copy text from the list of lambdas in this file to the clipboard, VS goes unresponsive for over a minute too.

I was wondering if this might be something specific to property initializers, so I tried putting the same collection initializer expression inside an ordinary method:

using System.Linq.Expressions;

namespace TryToMakeIde0001GoSlowly;

internal class LotsOfExpressionTreesInInitializers
{
    public static void UseInitializer()
    {
        var x = new MemberTable
        {
                (long ticks) => new global::System.DateTime(ticks),
                (long ticks, DateTimeKind kind) => new global::System.DateTime(ticks, kind),
                (int year, int month, int day) => new global::System.DateTime(year, month, day),
                (int year, int month, int day, int hour, int minute, int second) => new global::System.DateTime(year, month, day, hour, minute, second),
                (int year, int month, int day, int hour, int minute, int second, DateTimeKind kind) => new global::System.DateTime(year, month, day, hour, minute, second, kind),
                (int year, int month, int day, int hour, int minute, int second, int millisecond) => new global::System.DateTime(year, month, day, hour, minute, second, millisecond),
                (int year, int month, int day, int hour, int minute, int second, int millisecond, DateTimeKind kind) => new global::System.DateTime(year, month, day, hour, minute, second, millisecond, kind),

                () => global::System.DateTime.MaxValue,
                () => global::System.DateTime.MinValue,

                (global::System.DateTime dt) => dt.Date,
                (global::System.DateTime dt) => dt.Day,
                (global::System.DateTime dt) => dt.DayOfWeek,
                (global::System.DateTime dt) => dt.DayOfYear,
                (global::System.DateTime dt) => dt.Hour,
                (global::System.DateTime dt) => dt.Kind,
                (global::System.DateTime dt) => dt.Millisecond,
                (global::System.DateTime dt) => dt.Minute,
                (global::System.DateTime dt) => dt.Month,
                (global::System.DateTime dt) => dt.Second,
                (global::System.DateTime dt) => dt.Ticks,
                (global::System.DateTime dt) => dt.TimeOfDay,
                (global::System.DateTime dt) => dt.Year,

                (global::System.DateTime t1, global::System.DateTime t2) => global::System.DateTime.Compare(t1, t2),
                (global::System.DateTime t1, global::System.DateTime t2) => global::System.DateTime.Equals(t1, t2),

                (double d) => global::System.DateTime.FromOADate(d),
                (long fileTime) => global::System.DateTime.FromFileTimeUtc(fileTime),
                (int year) => global::System.DateTime.IsLeapYear(year),
                (int year, int month) => global::System.DateTime.DaysInMonth(year, month),
                (global::System.DateTime value, global::System.DateTimeKind kind) => global::System.DateTime.SpecifyKind(value, kind),

                (global::System.DateTime dt) => dt.ToOADate(),

                (global::System.DateTime dt, global::System.TimeSpan value) => dt.Add(value),
                (global::System.DateTime dt, double value) => dt.AddDays(value),
                (global::System.DateTime dt, double value) => dt.AddHours(value),
                (global::System.DateTime dt, double value) => dt.AddMilliseconds(value),
                (global::System.DateTime dt, double value) => dt.AddMinutes(value),
                (global::System.DateTime dt, int months) => dt.AddMonths(months),
                (global::System.DateTime dt, double value) => dt.AddSeconds(value),
                (global::System.DateTime dt, long value) => dt.AddTicks(value),
                (global::System.DateTime dt, int value) => dt.AddYears(value),
                (global::System.DateTime dt, global::System.DateTime value) => dt.Subtract(value),
                (global::System.DateTime dt, global::System.TimeSpan value) => dt.Subtract(value),

                (global::System.DateTime dt, global::System.DateTime value) => dt.CompareTo(value),
                (global::System.DateTime dt, global::System.DateTime value) => dt.Equals(value),

                (global::System.DateTime dt, global::System.TimeSpan ts) => dt + ts,
                (global::System.DateTime dt, global::System.TimeSpan ts) => dt - ts,
                (global::System.DateTime dt1, global::System.DateTime dt2) => dt1 - dt2,
                (global::System.DateTime dt1, global::System.DateTime dt2) => dt1 == dt2,
                (global::System.DateTime dt1, global::System.DateTime dt2) => dt1 != dt2,
                (global::System.DateTime dt1, global::System.DateTime dt2) => dt1 < dt2,
                (global::System.DateTime dt1, global::System.DateTime dt2) => dt1 <= dt2,
                (global::System.DateTime dt1, global::System.DateTime dt2) => dt1 > dt2,
                (global::System.DateTime dt1, global::System.DateTime dt2) => dt1 >= dt2,
        }.ToReadOnly();

        Console.WriteLine(x);
    }
}

It's less bad, but still pretty bad. IT takes "only" 23 seconds for the global::System prefixes to go grey, and then a couple more seconds for it to report that all background tasks are complete. So that's about 3x faster than the first example. But still very slow.

KathleenDollard commented 1 year ago

I know you are investigating this prior to exploring it as a Roslyn issue, and obviously it should move to that repo if it is a Roslyn issue. Looking over it, I had some questions, which you could answer now or if you explore it as a Roslyn issue.

KathleenDollard commented 1 year ago

@chethusk

CyrusNajmabadi commented 1 year ago

is there a full repro for this? Ideally something selfcontained at a git repo, so we could pull down and trace.

idg10 commented 1 year ago

We are working on isolating it properly to make a standalone repro, but we're not quite there yet (and I'm on vacation at the moment...)

idg10 commented 1 year ago

To answer @KathleenDollard 's questions:

The MemberTable.cs type is here: https://github.com/reaqtive/reaqtor/blob/4fca291cf3721255810f95b17d9c6c78a8114391/Nuqleon/Core/LINQ/Nuqleon.Linq.Expressions.Optimizers/System/Linq/Expressions/MemberTable.cs#L23

I'm not quite sure what a "LINQ" to it would mean so I'm hoping a link will suffice.

I am using Visual Studio 2022 version 17.7.6, although note that we see the hang on the build agent, so we see perf problems around this code both inside VS and outside.

We are currently using .NET SDK 7.0.403. We do not have a global.json.

We know the .NET 5 era version of the dotnet format tool did not run slowly enough to cause problems. Due to an oversight we were still using the 5.0 version of this tool in our dotnet format build step after upgrading our targets to 6.0. (We were installing it with dotnet tool install because that's what used to be required. In fact, for a long time I wasn't aware that this step was even in the build, and it was only when something caused me to look at it that I discovered a) that we were using this tool, and b) that this tool was now built into the .NET SDK, and that c) we were continuing to use the old version.) So we don't know at exactly which version the problem started. Somewhere after v5.

We did not add any new analyzers. The trigger for the performance issues causing us a problem was fixing our build process so that we use the copy of dotnet format that is now in the SDK, and not an old from the .NET 5 era. However, we have been seeing a problem where Visual Studio takes an age to finish running analyzers in the background after project load for a very long time. (I don't actually know how long it takes, because I've never sat and timed it. It may well be hours - I'm used to seeing the little background progress animation at the bottom left never stopping when I have this solution open.)

You started with "Since switching to the .NET 6.0 version of dotnet format..." can you further clarify what that change was?

That was probably a rather inaccurate description of the change... Here's an attempt to describe it better.

For ages, we had this in our pipeline:

https://github.com/reaqtive/reaqtor/blob/51fb8bd9929dec35a62a6fc0eb10f8dbf67eb39d/azure-pipelines.yml#L85-L105

Note that although this installs the 7.0 SDK, the step that runs install -g dotnet-format ended up installing a much older version:

Tool 'dotnet-format' (version '5.1.250801') was successfully installed.

You can see that happening in this build:

https://dev.azure.com/dotnet/Reaqtor/_build/results?buildId=99591&view=logs&j=a349d4ce-56e8-5f62-9b96-7410601bdf48&t=b2009e2e-76c4-53f3-8ba1-52cbffcf9059

When we realised that we just needed to use the version in the SDK, we made this change:

https://github.com/reaqtive/reaqtor/pull/137/commits/e9012e6dd9dd58296f1c65489ce548d5efef5e18 quickly followed by this: https://github.com/reaqtive/reaqtor/pull/137/commits/09a45e93bdf0167e984ba9b7916f8c154a25fcb7

And that was when we started seeing these performance problems.

So actually I think that first sentence is completely wrong—although we target .NET 6.0, we're using the .NET 7.0 SDK, so we went straight from version '5.1.250801' of the dotnet-format tool to whatever version is built into the current .NET 7.0 SDK. We never actually used the 6.0 version.

idg10 commented 1 year ago

@CyrusNajmabadi but if you don't want to wait for me to come back to work on Monday, the first two steps described in https://github.com/reaqtive/reaqtor/issues/138#issuecomment-1775389319 describe how to build a project from scratch that repros the problem. The resulting project just has a couple of files in it, and doesn't require you to download all of the Reaqtor repo.

(Those steps do include a dependency on Reaqtor, but via NuGet. I have a project on my computer that does away completely with the dependency on Reaqtor, and that's what I was aiming to share next week, because I thought it would be more helpful to demonstrate the issue in a no-dependencies project.)

CyrusNajmabadi commented 1 year ago

i'm going on vacation for the next week. So i can only help after that. If anyone wants to just profile what's going with perfview, it would likely show where the costs are.

idg10 commented 1 year ago

Repro now available here: https://github.com/idg10/repro-slow-ide0001

jaredpar commented 12 months ago

Thanks for getting that into a self contained project. Unfortunately I'm not able to reproduce the problem. I cloned the project, ran dotnet format and it completed in about 5 seconds.

> $d = [DateTime]::UtcNow; dotnet format; [DateTime]::UtcNow - $d;

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 5
Milliseconds      : 428
Ticks             : 54286000
TotalDays         : 6.28310185185185E-05
TotalHours        : 0.00150794444444444
TotalMinutes      : 0.0904766666666667
TotalSeconds      : 5.4286
TotalMilliseconds : 5428.6

Was concerned possibly IDE001 wasn't running so I created an editorconfig file, enabled it there but got the same results.

The compiler is sending ~10% of it's time in method group infererence / overload resolution. That's not too surprising given the nature of the code. To hit the particular timeout's you're referring to it would need to have a lot higher concentration.

idg10 commented 11 months ago

Hi @jaredpar

Thanks for looking into this. I have discovered that since I upgraded Visual Studio 2022 to 17.8, the repro in the form I supplied to you no longer works for me either.

However...

The problem continues to occur on the Reaqtor solution itself. So a colleague and I looked into this a little more. He still had an older VS install than me, and was still able to repro the problem. So we then dug a bit deeper to find out what exactly was different—what DLLs were different across our two machines.

In the version of the repro we were using (which is slightly different from the one I uploaded—our working one includes a BenchmarkDotNet project) we discovered one slightly weird difference:

My machine didn't have any .NET 7.0 SDKs on it this morning. (It would have done back before .NET 8.0 shipped. But I think when I updated to VS 17.8.0, the VS installer removed any .NET 7.0.x SDKs that were present.)

After I reinstalled .NET SDK 7.0.203 (which is rather old, but is the version that ended up getting used on my colleague's machine) I was once again able to repro the problem.

So as far as I can tell, on the latest VS 2022, then unless you've explicitly installed a .NET 7.0 SDK, the .NET 7.0 project in my repo actually ends up getting processed by a .NET 6.0 SDK, which doesn't appear to have this perf problem.

I then tried removing the .NET 7.0.203 SDK, and then upgrading the repro project to use .NET 8.0.

With the project on .NET 8.0, once again it takes about 40 seconds on my machine for it to process me typing Ctrl+C with the cursor on line 59.

And if I delete and then paste back in line 59, it stays like this for about a half a minute:

image

before finally resolving to this:

image

Note that the global::System text is not greyed out on the lined that calls AddDays in the first one. But once IDE0001 has run, it fades that out to indicate that it is redundant.

However, running dotnet format at the command line continues to run quickly. I can only repro the slowness either in Visual Studio, or in a benchmark we wrote that drives the Roslyn API directly (which we did to make it easier to reproduce more predictably).

TLDR:

I've pushed an update to the repo so it's now on .NET 8.0.

idg10 commented 11 months ago

OK, I've made another change, and with this one I'm able to repro slowness easily with dotnet format. Running this:

#$d = [DateTime]::UtcNow; dotnet format --no-restore --verify-no-changes --verbosity diagnostic .\MakeIde0001GoSlow.sln; $da = [DateTime]::UtcNow; $dt = $da - $d

I see this reported time:

Days              : 0
Hours             : 0
Minutes           : 13
Seconds           : 5
Milliseconds      : 381
Ticks             : 7853817943
TotalDays         : 0.00909006706365741
TotalHours        : 0.218161609527778
TotalMinutes      : 13.0896965716667
TotalSeconds      : 785.3817943
TotalMilliseconds : 785381.7943

To do this, I've made Repro.cs a lot more similar to the original file that causes the problem, so this is not the pared down example we originally hoped to create. But it does seem to be able to produce much slower results. (And it doesn't require any NuGet refs, so it's self-contained. It's just a bit more complex now.)

A couple of thing:

That last point is frustrating because I was hoping to get a more detailed report from a profiler by running the analysis in the context of the Roslyn project itself. (Since PDB files don't seem to be available for Microsoft.CodeAnalysis.CSharp.Features, you can't get particularly useful profiler results if you run against the version up on NuGet.)

It's also a bit odd because it implies that these analyzers run slowly only when run in particular contexts. (Although I guess it could be that I've just misdiagnosed which analyzer_ is really the slow one.)

jaredpar commented 11 months ago

@CyrusNajmabadi, @sharwell think analyzers / IDE will need to take a look at this. I can see the slow down with the latest repro. Mine is a bit faster: 10 seconds to ~13 but guessing that is just machine difference. Looking at a binlog from a build the compiler is executing how I would expect here. I grabbed a trace of the format command and nothing really stood out. Can see a few extra CodeStyle calls which I suspect are at the root of the issue (loosely suspect). Don't see anything that is' a compiler issue at the moment.

sharwell commented 10 months ago

This is a known slow scenario for IDE0001. You should be able to work around the problem by adding this:

using System_DateTime = global::System.DateTime;

And then replacing all the uses of global::System.DateTime with System_DateTime.

There is also a slight possibility things are improved by adding explicit return types to all of the lambdas. We've been working on reducing the binding work associated with fully-specified lambda signatures.

You would definitely see an improvement if the collection initializer was broken up into a long series of Add calls.