taoensso / nippy

The fastest serialization library for Clojure
https://www.taoensso.com/nippy
Eclipse Public License 1.0
1.04k stars 60 forks source link

java.nio.ByteBuffer based thaw'ing #8

Closed mpenet closed 11 years ago

mpenet commented 11 years ago

Just sending a PR for the record and to start a discussion.

So it appears it is indeed more performant, I see something around ~15-20% increase.

The big problem I see is backward compatibility since I had to drop read/writeUTF since it's not supported by bytebuffers by default. There could be a way to work around that by creating baInputstream instances just for string decoding where you used to use them, but I didn't choose to go this route since it was an experiment at this point and went for the quick kill, not to mention I suspect this would add some overhead, it's worth a try still.

ptaoussanis commented 11 years ago

Hey Max, thanks a lot for this!

Was about to tweet you - I'm not seeing the perf. increase you seem to be:

 ;;; 13 June 2013: Clojure 1.5.1, Nippy mpenet/2.0.0-alpha3 (nio)
  {:reader    {:freeze 23491, :thaw 26074, :round 49720, :data-size 22956}}
  {:defaults  {:freeze 4080,  :thaw 2432,  :round 5961,  :data-size 12403}}
  {:encrypted {:freeze 5725,  :thaw 3791,  :round 9310,  :data-size 12421}}
  {:fast      {:freeze 3479,  :thaw 1981,  :round 5462,  :data-size 13343}}

  ;;; 13 June 2013: Clojure 1.5.1, Nippy 2.0.0-alpha1
  {:reader    {:freeze 23124, :thaw 26469, :round 47674, :data-size 22923}}
  {:defaults  {:freeze 4007,  :thaw 2520,  :round 6038,  :data-size 12387}}
  {:encrypted {:freeze 5560,  :thaw 3867,  :round 9157,  :data-size 12405}}
  {:fast      {:freeze 3429,  :thaw 2078,  :round 5577,  :data-size 13237}}

Will continue fiddling on my end. Otherwise might help to see pre/post benchmark numbers from your hardware when you get a chance (no urgency).

Just run lein start-bench, then evaluate the appropriate block in benchmarks.clj.

Will be interesting to see how far we can take this. Cheers!

mpenet commented 11 years ago

Yes my master branch was a bit outdated to say the least, I will check again later.

ptaoussanis commented 11 years ago

The big problem I see is backward compatibility since I had to drop read/writeUTF since it's not supported by bytebuffers by default.

BTW I've pulled in your changes on this and deprecated the old keyword id regardless of what happens on the ByteBuffer side. It's nicer to have them both using the same mechanism, and it will help untie our hands if we do ever decide to change the stream format for whatever reason.

mpenet commented 11 years ago

Great decision, it definitely makes things easier.

ptaoussanis commented 11 years ago

not to mention I suspect this would add some overhead

Actually think that'd be a fine solution with the old type's id now marked as deprecated - the overhead would only apply to legacy data.

mpenet commented 11 years ago

BTW, due to some unexpected event (shortcircuit in my flat yesterday), I couldnt run the benchmarks! I will try be busy again this evening, but hopefully tomorrow :)

edit; Actually I didn run some of them yesterday after your message, but I had too much variance between runs to draw any conclusion.

ptaoussanis commented 11 years ago

No stress, take your time. This wouldn't be a breaking change anyway so we could do it any time, no particular need to get it in for 2.0.0.

Thanks again for your help.

mpenet commented 11 years ago

I modified a few things and ran some new tests during the lunch break, it's a close call, it also probably depends on the shape of the data. I am still not convinced here.

(do ; Roundtrip times
  (println
   {:fast
    {;; :freeze    (bench (freeze data {:compressor nil}))
     :thaw      (let [frozen (freeze data {:compressor nil})]
                  (bench (thaw frozen)))
     ;; :round     (bench (roundtrip-fast data))
     ;; :data-size (count (freeze data {:compressor nil}))
     }}))

nio

{:fast {:thaw 3965}}
{:fast {:thaw 3832}}
{:fast {:thaw 3826}}
{:fast {:thaw 3823}}
{:fast {:thaw 3837}}
{:fast {:thaw 3792}}
{:fast {:thaw 3811}}

master 4008aea007d95510344ef8abe9752a56846e3b75 :

{:fast {:thaw 4102}}
{:fast {:thaw 3985}}
{:fast {:thaw 3985}}
{:fast {:thaw 3985}}
{:fast {:thaw 3986}}
{:fast {:thaw 3990}}
{:fast {:thaw 4008}}
ptaoussanis commented 11 years ago

Have you played with the idea of allocating and reusing a single long-lived ByteBuffer for multiple thaw ops? (This'd be relevant for freeze ops also).

You could allocate an initial size (1mb, say) - and permanently enlarge it if you come across a larger ba. For initial testing just setup a large initial size (100mb, whatever) - and see if the idea is sound in principle. We could work out an enlargement strategy later.

mpenet commented 11 years ago

I haven't, but I have never seen other libs doing that (ex: cassandra, which uses bbuffers extensively), so I suspect this is not worth the trouble. I could be wrong (I often am).

ptaoussanis commented 11 years ago

I haven't, but I have never seen other libs doing that (ex: cassandra, which uses bbuffers extensively), so I suspect this is not worth the trouble. I could be wrong (I often am).

This is the way I'd normally use them since the allocation overhead is relatively very high (especially on the direct buffers, I believe).

It'd definitely be worth trying, I'd say.

mpenet commented 11 years ago

Yeah alloc is costly but the big issue would be with concurrent accesses to the same ByteBuffer instance... In this end this gets hairy fast.

netty 4 has buffer allocators, pooled, unpooled supporting tons of options, this is another possible alternative.

ptaoussanis commented 11 years ago

I'd still approach it from this angle though. We could potentially pool buffers, etc. - as you say there's ways of dealing with the concurrency implications. But before opening any doors, I'd still want a confirmation first that in principle there's a significant (i.e. worthwhile) performance benefit to be had.

The single-threaded, pre-allocated case can serve as kind of an upper limit on expected returns. If even in this case the performance gain is small (suspect it might be), then we save time by not heading further in that direction.

Binary-search if you will: let's figure out first if we're in the right hemisphere :-)

mpenet commented 11 years ago

nio recycled buffer https://github.com/mpenet/nippy/tree/nio2

{:fast {:thaw 3838}} {:fast {:thaw 3719}} {:fast {:thaw 3710}} {:fast {:thaw 3710}} {:fast {:thaw 3708}}

nio classic https://github.com/mpenet/nippy/tree/nio

{:fast {:thaw 3908}} {:fast {:thaw 3793}} {:fast {:thaw 3787}} {:fast {:thaw 3780}} {:fast {:thaw 3791}}

netty wrapped ByteBuf https://github.com/mpenet/nippy/tree/netty

{:fast {:thaw 3966}} {:fast {:thaw 3881}} {:fast {:thaw 3864}} {:fast {:thaw 3858}} {:fast {:thaw 3856}}

and netty pooled direct bytebuf gave me numbers in the 5k commented code in this branch : https://github.com/mpenet/nippy/blob/netty/src/taoensso/nippy.clj

It doesn't seem to make a lot of difference :( Unless I am doing something wrong or missing the obvious.

ptaoussanis commented 11 years ago

Yeah, looks like there's not a lot to gain with the swap. (Ditto on the missing-something-obvious btw: I've very little experience with nio).

In any case, I appreciate the attempt- definitely think it was worth a shot! If nothing else, these results are quite interesting...

mpenet commented 11 years ago

You're welcome. I think you can probably close this PR, it was worth experimenting with it, but for now I don't think it needs further exploration.