Closed louthy closed 5 years ago
Hi Paul, I have an interesting one. This line:
var vals = Range(0, 1000000).Map(i => Success<string, int>(i)).Sequence().Freeze();
It takes 2s with v3.1.17 and minutes (I did not wait for it) with v3.1.18 and higher (including v3.2.0 beta)..
@StanJav Hmm, that's not touching any optimised code, the only change that might affect it is in MEnumerable
- I'll take a look.
Thanks. I did wait for the result finally - 70mins. I did not dig further.
@StanJav That issue came in on v3.1.18
- not the latest performance changes. I will keep digging, but I think it's (ironically) the ConcatFast
function that's causing the problem (and the fix to protect against enumerable concat stackoverflows)
and the fix to protect against enumerable concat stackoverflows
Hmmm.... I am actually looking for one stackoverflow around validation .Sequence()
but I cannot repro it anymore. I will take a look what was that change. Thanks.
@StanJav That's now fixed in 3.2.1-beta
3.2.2-beta
HashMap<K, V>
and HashMap<EqK, K, V>
are now about 4 times faster (testing with maps of 1,000,000 items). The underlying implemenation has now switched from being a Map<int, Lst<(K, V)>
to a Compressed Hash Array Map Trie (CHAMP).
This still has some more performance to squeeze out of it as it's a relatively naive implementation at the moment, so I won't post up the figures just yet.
3.2.3-beta
Some further optimisation of the HashMap
now brings significant performance gains over Microsoft's System.Collections.Immutable.ImmutableDictionary<K, V>
which is the closest type to the lang-ext HashMap
.
Type | 10,000 Add operations |
10,000 random access operations | Enumeration of a 10,000 item collection |
---|---|---|---|
HashMap (new) | 200 ns / op |
1 ns / op |
1 ns / op |
HashMap (old) | 600 ns / op |
700 ns / op |
300 ns / op |
ImmutableDictionary | 700 ns / op |
1 ns / op |
100 ns / op |
Type | 100,000 Add operations |
100,000 random access operations | Enumeration of a 100,000 item collection |
---|---|---|---|
HashMap (new) | 500 ns / op |
80 ns / op |
100 ns / op |
HashMap (old) | 1100 ns / op |
590 ns / op |
290 ns / op |
ImmutableDictionary | 1160 ns / op |
60 ns / op |
180 ns / op |
Type | 1,000,000 Add operations |
1,000,000 random access operations | Enumeration of a 1,000,000 item collection |
---|---|---|---|
HashMap (new) | 651 ns / op |
72 ns / op |
63 ns / op |
HashMap (old) | 1397 ns / op |
606 ns / op |
286 ns / op |
ImmutableDictionary | 1402 ns / op |
93 ns / op |
204 ns / op |
3.2.4-beta
HashSet
has been switched to use the same TrieMap
that HashMap
uses. This will give a similar performance boost to HashSet
.
There have been some more tweaks to TrieMap
performance too.
More to come!
I thought I'd post this here to get feedback on what's going to be a series of performance improvements to the library. I will update this thread with details of the releases as-and-when.
Release notes
3.2.0-beta
Included in this release are performance improvements for
Option<A>
,OptionUnsafe<A>
, andSeq<A>
.Seq<A>
has now also become astruct
to stop the usage ofnull
but also to facilitate the new performance improvements.ISeq<A>
will eventually be deprecated and so there are now[Obsolete]
warnings - but I won't kill it for quite a long time, I just want to discourage its usage.I'm doing this as a
beta
release because the changes toSeq<A>
are enormous - well, a complete re-write. It is now entirely lock free and internally has three possible states:Lazy
,Strict
, andEmpty
- which allows for improved performance for the happy paths.I will keep language-ext in beta for a while as I go through the library optimising what I believe are the most used features. I am doing this now because I feel the library is settling down - as in there's less churn of features and interfaces - and so now is a good time to start squeezing the most performance out of it.
Mostly this will involve trying to reduce memory allocations where possible as well as inlining operations that make unnecessary calls (which were there to keep everything super generic under the hood). This will make long term maintenance a bit harder, which is why it needed to wait until now.
Performance timings
Below are some tests I've used to tweak the performance, the timings are all relative from the old
Seq
to the new, so my machine's spec shouldn't matter.Streaming a lazy sequence
This performance test wraps a lazy enumerable in a
Seq
and then streams them in one-by-one via theforeach
loop and saves the value in aresults
array.Seq<A>
Seq<A>
138 ns / op
54 ns / op
So a performance improvement of 2.55 times. Evaluating an
IEnumerable<T>
lazy sequence is about29 ns / op
which whilst faster thanSeq<A>
doesn't have the memoisation thatSeq
has. (or any of the other cool features, for that matter)Streaming a strict sequence
A lot of the time I find I'm working with strict sequences - i.e. non-lazy. This is the happy path for sequences and a lot of optimisations can happen if the code knows the sequence isn't lazy.
Seq<A>
Seq<A>
16.1 ns / op
10.9 ns / op
A performance improvement of ~32.3%. This isn't quite as significant as the lazy stream improvements, but interestingly this is now faster than
List<T>
from the BCL, which measures around12.5 ns / op
. Also of interest isSystem.Collections.Immutable.ImmutableList<T>
is105.6 ns / op
.Adding an item to the end of a
Seq
Essentially calling this, many times:
Seq<A>
Seq<A>
49.5 ns / op
33.2 ns / op
Around a 33% improvement. This isn't quite as fast as the
10.9 ns / op
of the BCL'sList<T>
, I will work on this some more. But this is an immutable, thread-safe, data structure - which the BCLList<T>
definitely isn't.Adding an item to the beginning of a
Seq
Essentially calling this, many times:
Seq<A>
Seq<A>
40 ns / op
20 ns / op
Twice as fast. This isn't far off
List<T>.Add
- and so that's why I think I can makeSeq<A>.Add
gain a bit more speed. Note: the BCL'sList<T>.Insert(0, x)
which is the equivalent toSeq<A>.Cons
has terrible performance at10310 ns / op
. So that's one to look out for!Other
Seq<A>
functionsMost
Seq<A>
functions will either evaluate a lazy stream, a strict stream, orAdd
orCons
items. And so nearly allSeq<A>
related functionality will gain due to these changes.Option<A>
andOptionUnsafe<A>
I haven't done any performance timings for these types, as most improvements are to reduce unnecessary allocations. I have removed the support for lazy options - if anyone misses them then I'll probably create an
OptionLazy<A>
type. I'd rather not, but I felt that becauseOption<A>
is probably the most used type in lang-ext then it should always be on the happy path. This has reduced a lot of unnecessary branching and has allowed the internals ofOption<A>
andOptionUsafe<A>
to be simplified.Feedback please
Because these are some big changes, if you have an app with lots of unit tests it'd be great if you could verify that they all still run with these updates. I have obviously made sure all of the existing unit tests pass and have built a number of test harnesses to check that everything still works and to verify the thread-safeness of
Seq
, but changes like these have a habit of kicking you when you least expect it. So, any help will be gratefully received.