louthy / language-ext

C# functional language extensions - a base class library for functional programming
MIT License
6.41k stars 416 forks source link

Performance improvements #599

Closed louthy closed 5 years ago

louthy commented 5 years ago

I thought I'd post this here to get feedback on what's going to be a series of performance improvements to the library. I will update this thread with details of the releases as-and-when.

Release notes 3.2.0-beta

Included in this release are performance improvements for Option<A>, OptionUnsafe<A>, and Seq<A>. Seq<A> has now also become a struct to stop the usage of null but also to facilitate the new performance improvements. ISeq<A> will eventually be deprecated and so there are now [Obsolete] warnings - but I won't kill it for quite a long time, I just want to discourage its usage.

I'm doing this as a beta release because the changes to Seq<A> are enormous - well, a complete re-write. It is now entirely lock free and internally has three possible states: Lazy, Strict, and Empty - which allows for improved performance for the happy paths.

I will keep language-ext in beta for a while as I go through the library optimising what I believe are the most used features. I am doing this now because I feel the library is settling down - as in there's less churn of features and interfaces - and so now is a good time to start squeezing the most performance out of it.

Mostly this will involve trying to reduce memory allocations where possible as well as inlining operations that make unnecessary calls (which were there to keep everything super generic under the hood). This will make long term maintenance a bit harder, which is why it needed to wait until now.

Performance timings

Below are some tests I've used to tweak the performance, the timings are all relative from the old Seq to the new, so my machine's spec shouldn't matter.

Streaming a lazy sequence

This performance test wraps a lazy enumerable in a Seq and then streams them in one-by-one via the foreach loop and saves the value in a results array.

var seq = Seq(Range(0, count));
foreach(var item in seq)
{
    results[j] = item;
    j++;
}
Old Seq<A> New Seq<A>
138 ns / op 54 ns / op

So a performance improvement of 2.55 times. Evaluating an IEnumerable<T> lazy sequence is about 29 ns / op which whilst faster than Seq<A> doesn't have the memoisation that Seq has. (or any of the other cool features, for that matter)

Streaming a strict sequence

A lot of the time I find I'm working with strict sequences - i.e. non-lazy. This is the happy path for sequences and a lot of optimisations can happen if the code knows the sequence isn't lazy.

var seq = Seq(Range(0, count));
seq = seq.Strict();

foreach (var item in seq)
{
    results[j] = item;
    j++;
}
Old Seq<A> New Seq<A>
16.1 ns / op 10.9 ns / op

A performance improvement of ~32.3%. This isn't quite as significant as the lazy stream improvements, but interestingly this is now faster than List<T> from the BCL, which measures around 12.5 ns / op. Also of interest is System.Collections.Immutable.ImmutableList<T> is 105.6 ns / op.

Adding an item to the end of a Seq

Essentially calling this, many times:

seq = seq.Add(value);
Old Seq<A> New Seq<A>
49.5 ns / op 33.2 ns / op

Around a 33% improvement. This isn't quite as fast as the 10.9 ns / op of the BCL's List<T>, I will work on this some more. But this is an immutable, thread-safe, data structure - which the BCL List<T> definitely isn't.

Adding an item to the beginning of a Seq

Essentially calling this, many times:

seq = value.Cons(seq);
Old Seq<A> New Seq<A>
40 ns / op 20 ns / op

Twice as fast. This isn't far off List<T>.Add - and so that's why I think I can make Seq<A>.Add gain a bit more speed. Note: the BCL's List<T>.Insert(0, x) which is the equivalent to Seq<A>.Cons has terrible performance at 10310 ns / op. So that's one to look out for!

Other Seq<A> functions

Most Seq<A> functions will either evaluate a lazy stream, a strict stream, or Add or Cons items. And so nearly all Seq<A> related functionality will gain due to these changes.

Option<A> and OptionUnsafe<A>

I haven't done any performance timings for these types, as most improvements are to reduce unnecessary allocations. I have removed the support for lazy options - if anyone misses them then I'll probably create an OptionLazy<A> type. I'd rather not, but I felt that because Option<A> is probably the most used type in lang-ext then it should always be on the happy path. This has reduced a lot of unnecessary branching and has allowed the internals of Option<A> and OptionUsafe<A> to be simplified.

Feedback please

Because these are some big changes, if you have an app with lots of unit tests it'd be great if you could verify that they all still run with these updates. I have obviously made sure all of the existing unit tests pass and have built a number of test harnesses to check that everything still works and to verify the thread-safeness of Seq, but changes like these have a habit of kicking you when you least expect it. So, any help will be gratefully received.

StanJav commented 5 years ago

Hi Paul, I have an interesting one. This line:

var vals = Range(0, 1000000).Map(i => Success<string, int>(i)).Sequence().Freeze();

It takes 2s with v3.1.17 and minutes (I did not wait for it) with v3.1.18 and higher (including v3.2.0 beta)..

louthy commented 5 years ago

@StanJav Hmm, that's not touching any optimised code, the only change that might affect it is in MEnumerable - I'll take a look.

StanJav commented 5 years ago

Thanks. I did wait for the result finally - 70mins. I did not dig further.

louthy commented 5 years ago

@StanJav That issue came in on v3.1.18 - not the latest performance changes. I will keep digging, but I think it's (ironically) the ConcatFast function that's causing the problem (and the fix to protect against enumerable concat stackoverflows)

StanJav commented 5 years ago

and the fix to protect against enumerable concat stackoverflows

Hmmm.... I am actually looking for one stackoverflow around validation .Sequence() but I cannot repro it anymore. I will take a look what was that change. Thanks.

louthy commented 5 years ago

@StanJav That's now fixed in 3.2.1-beta

louthy commented 5 years ago

Release notes 3.2.2-beta

HashMap<K, V> and HashMap<EqK, K, V> are now about 4 times faster (testing with maps of 1,000,000 items). The underlying implemenation has now switched from being a Map<int, Lst<(K, V)> to a Compressed Hash Array Map Trie (CHAMP).

This still has some more performance to squeeze out of it as it's a relatively naive implementation at the moment, so I won't post up the figures just yet.

louthy commented 5 years ago

Release notes 3.2.3-beta

Some further optimisation of the HashMap now brings significant performance gains over Microsoft's System.Collections.Immutable.ImmutableDictionary<K, V> which is the closest type to the lang-ext HashMap.

Type 10,000 Add operations 10,000 random access operations Enumeration of a 10,000 item collection
HashMap (new) 200 ns / op 1 ns / op 1 ns / op
HashMap (old) 600 ns / op 700 ns / op 300 ns / op
ImmutableDictionary 700 ns / op 1 ns / op 100 ns / op
Type 100,000 Add operations 100,000 random access operations Enumeration of a 100,000 item collection
HashMap (new) 500 ns / op 80 ns / op 100 ns / op
HashMap (old) 1100 ns / op 590 ns / op 290 ns / op
ImmutableDictionary 1160 ns / op 60 ns / op 180 ns / op
Type 1,000,000 Add operations 1,000,000 random access operations Enumeration of a 1,000,000 item collection
HashMap (new) 651 ns / op 72 ns / op 63 ns / op
HashMap (old) 1397 ns / op 606 ns / op 286 ns / op
ImmutableDictionary 1402 ns / op 93 ns / op 204 ns / op
louthy commented 5 years ago

Release notes 3.2.4-beta

HashSet has been switched to use the same TrieMap that HashMap uses. This will give a similar performance boost to HashSet.

There have been some more tweaks to TrieMap performance too.

More to come!