vert-x3 / vertx-proton

Apache License 2.0
26 stars 26 forks source link

improve efficiency of delivery tag generation #78

Closed franz1981 closed 6 years ago

franz1981 commented 6 years ago

It is an alternative (and more performant) version of https://github.com/vert-x3/vertx-proton/pull/76 that address the encoding of the generated tag using a binary conversion ie a copy.

franz1981 commented 6 years ago

@gemmellr Here we don't need to cover negative values as long that the value ranges is [Integer.MIN_VALUE, Integer.MAX_VALUE]. IN order to guarantee to work on platform that doesn't support Unsafe I have opted to create an alternative version that do not follow the same exact behaviour of the common path ie it will just use Little Endian encoding. The common path instead will use the most performant byte ordering of the given platform (that's how Unsafe handle things).

gemmellr commented 6 years ago

How much performance does that actually give versus just writing the value out ourselves? I'm not really a fan of using other projects internal classes, particularly for small differences. Is the 'slow path' actually faster than writing it out ourselves for example? Having paths with different encoding also feels weird.

franz1981 commented 6 years ago

That's fair TBH, in term of percentage using quiver I have seen ~5% of send cost occupied by generateTag, with a fair amout of garbage created: with the binary version....a negligible cost. With the tools I have used the cost was so little that it wasn't present on the graphs. Re the different encoding path I have to think better how to handle it while leaving the same encoding regardless Unsafe is present or not (AFAIK IBM J9, Zing etc etc have Unsafe and only recently JDK 11 is starting to remove some of its methods). The cleanest way to handle it could be to have an unpooled heap ByteBuf as a member and using it to hold the tag value: if a new tag is being generated it could be used to copy out a fresh byte[] holding the binary encoded tag: I'm not sure that it would be perfomant enough. Re the perf improvements IMO the most noticeable thing is not on throughput level, but more on garbage produced that would easily became several times less than the original version.

gemmellr commented 6 years ago

I'm not asking about comparison to the original version, its obviously quite naive and not that fast. I'm wondering about something simpler like manually writing the bytes contents etc, i.e how much difference is all the unsafe stuff really making versus an improved basic impl. If as you say the difference is really in the GC, that seems like it would still give most of the win without the other hassles.

franz1981 commented 6 years ago

I like the idea!
I have manually written a little endian encoding : the original method were delivering ~35 M ops/sec, the unsafe one ~170 M ops/sec and this one 140 M ops/sec, but without using any unsafe classes :+1: