tompazourek / NaturalSort.Extension

🔀 Extension method for StringComparison that adds support for natural sorting (e.g. "abc1", "abc2", "abc10" instead of "abc1", "abc10", "abc2").
MIT License
161 stars 12 forks source link

Sorting very slow #2

Closed danielnierlin closed 5 years ago

danielnierlin commented 5 years ago

The natural sort extension is very slow (~17 sec) sorting this list: naturalSort.txt

Regular StringComparer.OrdinalIgnoreCase is somewhere at 9ms.

Here's a little test project: NaturalSortExtensionsTest.zip

What's the reason for the bad performance? And how can it be fixed?

tompazourek commented 5 years ago

Hey @danielnierlin,

Thanks for reaching out. You're right, 17 seconds just looks really bad...

I never actually tried the sorting on any lists of substantial size, so performance hasn't really been a problem for me.

From the top of my head, I'm not sure why it performs so bad. At the moment, the algorithm works in a way that it tokenizes the string and then compares the sequence of tokens in a particular way. It might be that the way the IComparer works, it does the tokenization too many times even for the same string.

If you are willing to investigate or have some ideas, the source code isn't very complicated: https://github.com/tompazourek/NaturalSort.Extension/blob/master/src/NaturalSort.Extension/StringComparerNaturalSortExtension.cs

tompazourek commented 5 years ago

I have now completely rewritten the algorithm to make it perform as well as I was able to. I've released this as part of version 2.1.0.

I'm not sure if I can think of a way to make it perform even better (the algorithm is just much more complex than ordinal comparison). I'm closing the issue, but if anybody's got some more ideas, we can reopen this, or start a new issue or PR.

When doing some performance testing, I also noticed that the sorting performs much better in .NET Core apps. I'm not sure if it's an option for you to switch to .NET Core, but just know that you'll get another very significant performance improvement if you do.