Closed dsyme closed 9 years ago
(I removed the bit about avoiding tailcalls since it doesn't make things any faster)
The performance on the streamTest() in StreamsTest.fs after these changes is:
// Real: 00:00:02.384, CPU: 00:00:02.375, GC gen0: 0, gen1: 0, gen2: 0
// Real: 00:00:02.490, CPU: 00:00:02.484, GC gen0: 0, gen1: 0, gen2: 0
// Real: 00:00:02.362, CPU: 00:00:02.343, GC gen0: 0, gen1: 0, gen2: 0
The performance of 0.3.0 is comparable (actually a bit slower)
// Real: 00:00:03.352, CPU: 00:00:03.343, GC gen0: 0, gen1: 0, gen2: 0
// Real: 00:00:02.631, CPU: 00:00:02.640, GC gen0: 0, gen1: 0, gen2: 0
// Real: 00:00:02.858, CPU: 00:00:02.843, GC gen0: 0, gen1: 0, gen2: 0
// Real: 00:00:03.156, CPU: 00:00:03.156, GC gen0: 0, gen1: 0, gen2: 0
It is a little bit slower ~ 1 sec?
Btw I'm doing research with @biboudis on multi-stage programming and stream fusion in MetaOcaml and Scala LMS and we are very positive that we can port some of our research ideas to F#.
@palladin - It's actually a bit faster in this configuration, with the tailcalls removed, see https://github.com/nessos/Streams/pull/31#issuecomment-111158738. It seems sensitive when the leaf functions are simply +/- operations. In any case it would be good to try to verify that I haven't horked perf.
@palladin - How do wee get @biboudis to move his research to F#, then we could all cooperate very easily :)
@dsyme It is very easy... MSR funding :)
@dsyme Is it possible to check the performance with the 64 bit jitter (because I've seen huge differences between 32 and 64)
And I'm pretty sure that the let inline are critical for the extreme 64 bit jitter peformance
Yes, I've been using 64-bit. The inlining is still carefully orchestrated to boil things down to mapCont
and iter
in a way that the existing inlining of function calls was preserved. But you should verify perf.
Research through osmosis of ideas may lead us to F# very quickly :-)
The diff between the "fix iterator()" ( #30 ) and "hid representation" ( #31 ) PRs is here: https://github.com/dsyme/Streams/compare/fix-iterator...dsyme:hide-reprs
@biboudis But I want to osmote too :) . BTW the reason I'm doing these PRs is to finally deeply understand what you guys have been doing :)
@dsyme Based on my performance assessment everything looks ok!
@palladin BTW at some point I also hope to look at hiding the representation details for ParStream. I understand that might be a bit harder :) Do you have a feeling for whether the inlining is less crucial there? And do you have a feeling for whether existing consumers of ParStream take advantage of the representation details, e.g. ParIterator, Collector, SourceType and the abstract members on ParStream?
Based on my experience the inline is equally important in ParStream, because ParStream and PSeq are designed for cpu-multicore (throughput) scenaria.
One important fact is also that CloudFlow is using the ParIterator internally.
I've looked into what it would take to internalize the representations of streams while keeping performance where code is well inlined and at least as much fusing occurs as today.
The aim is to put this library on a sustainable binary-compatible basis while keeping performance.
The PR is an extension of #30 which should be merged first - when that is done the diff below will reduce
The formulation I've used below seems to work
nocurry()
It would be very cool if we could somehow find a way to fully fuse map --> map --> map --> map chains, however I haven't been able to do that, I don't think it's possible with F# 4.0's optimizer, even when we fully leak all representations.