Open GoogleCodeExporter opened 9 years ago
Interesting start. Definitely worth evaluating or spiking an implementation.
LastSubscriber isn't widely used, so the BatchSubscriber is more interesting at
least to me.
Original comment by mike.ret...@gmail.com
on 25 Aug 2010 at 9:39
Well, how do you think, what should be done for the implementation to be
included as alternative or replacement? May be some perfomance testing?
Original comment by chere...@gmail.com
on 27 Aug 2010 at 5:42
For performance critical apps, this implementation should probably not copy the
values into a list. A better implementation would be to pass an iterator to the
callback so the consumer can read all the values from the queue. This reduces
allocations and is thread safe.
Original comment by mike.ret...@gmail.com
on 2 Sep 2010 at 12:23
I added an the implementation and a perf test.
http://code.google.com/p/jetlang/source/detail?r=484
The perf test shows that the performance is nearly the same as the existing
implementation at least on my dual core macbook pro. The code needs a test
case that demonstrates a performance difference. This is a first draft. Let me
know if you have any suggestions.
Original comment by mike.ret...@gmail.com
on 2 Sep 2010 at 1:43
Well, I've make some cleanup (remove MemoryChannel, etc) in your test to see
the raw performance of BatchSubscriber itself (in attach), and got even worst
result -- lock-based implementation definitly outperforms my lock-free. I've
done my tests on rather old machines -- my old MBP (Core Duo), and 3 years old
P4 desktop, and in both cases BatchSubscriber win the race. Results is quite
unstable, so I repeat each case several times. In case of 2 publisher threads
performance is more or less equals, but the more publishers -- the more
BatchSubscriber outperforms. Quite surprizing for me, but true.
I can suggest several explanations:
First of all, as far as I know, old CPUs have rather expensive CAS
implementation. Romors is what current Quads and Core 2 Duo is better in this
-- but I haven't any Quads close to me to check.
Second, is CuncurrentLinkedQueue -- it uses 1 CAS per dequeue, but 2 CAS (which
should both succeed) for enqueue. With addition of my 1 CAS per enqueue it
makes enqueue rather expensive - possible, more expensive then one simple
short-time lock.
I still think BatchSubscriber can be implemented in more efficient way using
CAS, but seems this try is failed :) I'll take some time to think about it.
As for using iterator over CLQ instead of newly allocated buffer -- I do not
sure it is the right way. CLQ with it linked-list nature, with bad
cache-locality it not a best structure to single-thread iterate. And from
client point of view, have a plain-old array-backed List in own exclusive
access is (simpler, clearer?) then have an iterator over some multithread
structure with undefined size.
Original comment by chere...@gmail.com
on 5 Sep 2010 at 7:42
Attachments:
Original issue reported on code.google.com by
chere...@gmail.com
on 22 Aug 2010 at 9:12Attachments: