gui-cs / Terminal.Gui

Cross Platform Terminal UI toolkit for .NET
MIT License
9.74k stars 695 forks source link

Track down and optimize instances of expensive/sub-optimal string manipulation #3214

Open dodexahedron opened 10 months ago

dodexahedron commented 10 months ago

There's a fair amount of things like string concatenation in loops and various other things related to relatively simple string operations that are costly, sprinkled around both old and new code.

These are an easy target for optimization, even if just done at a basic level like not calling operator+(string,string) and instead using spans, string.Format, string interpolation, or even just StringBuilder (not an exhaustive list).

I'd like to make a pass over the solution and do at least that kind of basic optimization of impacted code, at some point.

The intent of this specific issue is just to do the very basics like that, unless additional optimizations can also be made with minimal additional effort, for this pass. More advanced and in-depth optimizations can be handled later.

This is also the kind of thing that ReSharper can help track down and fix, too.

dodexahedron commented 10 months ago

This is also relevant to #2725

tig commented 10 months ago

We need good perf tools/instrumentation first. Identify the worst offenders with the ability to measure before/after.

dodexahedron commented 10 months ago

There are some decent free ones, and there are some EXCELLENT ones in the JetBrains tools.

I have the all products pack and make use of DotCover, DotMemory, and DotTrace quite a lot, especially when I'm doing performance testing/analysis beyond what is obvious as a human reading existing code.

I know that at least some and believe that maybe even all of those are also available for free, as command-line tools, like the ReSharper formatter, too.

They are also cross-platform, and I fire them up for projects when I'm booted into Linux and doing that kind of work, as well.

dodexahedron commented 10 months ago

The most basic and simple thing to do, though, if you are trying to improve something that looks like an obvious pain point, is to just write a test method and time execution of the relevant code with a StopWatch, which is an accurate timer designed mostly for that kind of situation, and to perform those runs thousands of times in a loop, like I've shown in a few places here. Plus, that's free.

Side note: Benchmarking stuff might be a good candidate for a separate project in the solution, along the lines of the other discussion we were having about the unit tests. They're unit test projects, and the tests themselves aren't concerned with accuracy, since the main UnitTests project is responsible for that. They just time things, and you can do stuff like put limits for expected execution times, for the result of the test (NUnit has good ones for this, but there are also specific benchmark libraries out there too), so that, if a change results in something blowing up and taking a lot longer than expected, it'll be flagged as a failure or warning or however desired, so we have early warning of problematic changes.

dodexahedron commented 10 months ago

(Just keeping these thoughts separate in multiple comments)

ReSharper itself can also point out a wide range of likely sub-optimal code, too.

Yes, it's always running various analyzers for you in real-time, but there's also the "Find Code Issues" tool, which you can use on a solution, project, folder, namespace, code file, individual type, or even specific type member.

It runs ALL of the analyzers that aren't disabled and reports them to you in a tree in a tool pane, which you can then use to navigate directly to the lines in question and either let it do its suggested refactoring (which is what ends up being the majority of the time) or manually fix or otherwise refactor the code.

It'll point out performance issues, code quality issues, language feature usage opportunities, and a ton of other stuff. And, any ReSharper plugins you have installed that have inspections are included in the tree, as well. So things like the Heap Allocations plugin I have are great because they'll show you in that nice tree where there is a heap allocation that you can then act on (or not), as appropriate.

dodexahedron commented 10 months ago

As for things like what I opened this specific issue for, this is a relatively human-friendly problem to see, recognize, and fix. For example, any time you see two strings being added together, fix it. It should never happen, because there is always a better way (in terms of performance). How you address it is like 3 parts knowing the best option for that situation and 1 part choosing the option that you are most comfortable with or that results in nice code, so long as it's better than what it was before. If you know how to wield Spans, great. If not, no big deal. StringBuilder is still a viable option and better than string concatenation in a percentage of cases nearly indistinguishable from 100%. I found a bunch of string concatenation and a bunch of unnecessary ToString calls in various types that are super low-hanging fruit, just while working on Color, as I went to usages of the Color type.

However, ReSharper and VS both have analyzers that will point these out, too. Most often, the current version will suggest using string interpolation, which is also quite often a good choice. Just pay attention to boxing of value types, in certain cases, which can undo half your gains.

The Find Code Issues thing in my previous comment, by the way, also includes Visual Studio inspections. ReSharper, in general, tries to augment VS, in most of its functionality, rather than outright replacing it, so you get the best of both worlds in the majority of situations.

dodexahedron commented 10 months ago

That's all the commentary I had to add right this moment.

Now to go write a wall of text elsewhere! 😅

dodexahedron commented 10 months ago

Oops. I lied. One more just occurred to me.

This issue would be a great candidate for a "Performance" project, which would also include other similar issues that are aimed at other specific performance points.

The Color one is another example, along with like half the issues or comments on other issues I've barfed up here. 😆

OK, NOW I'm pretty sure I'm done on this topic for now.