taoensso / nippy

The fastest serialization library for Clojure
https://www.taoensso.com/nippy
Eclipse Public License 1.0
1.04k stars 60 forks source link

repeatedly-into as a macro #9

Closed mpenet closed 11 years ago

mpenet commented 11 years ago

As long as we are in the extreme, I modified repeatedly-into to be a macro (sacrilege, I know) and avoid the small overhead from the fn calls/creation.

benchmark is against master this time (the code is against dev):

macro {:fast {:thaw 3534}} {:fast {:thaw 3472}} {:fast {:thaw 3428}} {:fast {:thaw 3408}} {:fast {:thaw 3431}} {:fast {:thaw 3416}}

fn {:fast {:thaw 4148}} {:fast {:thaw 4048}} {:fast {:thaw 4039}} {:fast {:thaw 4063}} {:fast {:thaw 4059}} {:fast {:thaw 4054}}

I would normaly not even try to apply such optimisations (it's a bit ugly to read/use) but it's here if you want it.

ptaoussanis commented 11 years ago

Oh that's great- not ugly at all. (It's pretty well encapsulated, and this is a very hot path in a perf-sensitive application, so if we can't get fancy here - where can we? ^^).

Wouldn't have thought of that, didn't think it'd make such a significant difference.

Actually, now you've got me curious about the fn overhead generally... there's more places this technique could be applied (e.g. coll-thaw itself)...

ptaoussanis commented 11 years ago

Unrelated question: have you been noticing any JVM core dumps with the 2.x branch? The integration test for this PR died with one, and I've seen one on my end as well - both apparently related to Snappy. I think I have an idea of what might be causing it, but any further anecdotes would be useful.

mpenet commented 11 years ago

I didn't go that far but yes, some other fns could be inlined. I have noticed the core dumps, I didn't pay too much attention to them though.

ptaoussanis commented 11 years ago

Think you might enjoy the results: https://raw.github.com/ptaoussanis/nippy/master/benchmarks/chart.png

ptaoussanis commented 11 years ago

To expand on that a little, the reduce-into macro made a larger difference because it's so hot in the synthetic benchmark (lots of very large collections).

Swapping everything else too had a nice impact on more realistic benchmarks (i.e. smaller, more frequents amount of data being de/serialized). In particular, this change helped equalize the thaw/freeze times.

The code is actually shorter too as a side-effect since a lot of the type hinting could be removed. Overall I think this is a positive change.

mpenet commented 11 years ago

Impressive! I was actually working on inlining some other fns, but you beat me to it :) Very positive indeed.

Short of dropping to java for some tasks I would be suprised to find other optimisations, but as we've seen, tiny changes can make a huge difference so who knows.

ptaoussanis commented 11 years ago

Agreed. In any event Nippy (even 1.x) has never been a bottleneck for me, so this was mostly for fun/curiosity - genuinely surprised how much low-hanging fruit there still was :-)

Just need to resolve the core dump issue, then I'll start migrating some projects over to 2.x for testing.

Cheers!