imace / jetlang

Automatically exported from code.google.com/p/jetlang
0 stars 0 forks source link

Lock-free implementation #8

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
It's not a bug, but a suggestion -- most of JetLang code are lock-based, 
althought most of it can be reworked in lock-free manner -- which is ussually 
faster and more scaleable. I've attached an example -- my lock-free version of 
BatchSubscriber. If you think it's perspective direction, I can try to rework 
LastSubscriber, and take a look farther.

Original issue reported on code.google.com by chere...@gmail.com on 22 Aug 2010 at 9:12

Attachments:

GoogleCodeExporter commented 9 years ago
Interesting start. Definitely worth evaluating or spiking an implementation. 
LastSubscriber isn't widely used, so the BatchSubscriber is more interesting at 
least to me.

Original comment by mike.ret...@gmail.com on 25 Aug 2010 at 9:39

GoogleCodeExporter commented 9 years ago
Well, how do you think, what should be done for the implementation to be 
included as alternative or replacement? May be some perfomance testing?

Original comment by chere...@gmail.com on 27 Aug 2010 at 5:42

GoogleCodeExporter commented 9 years ago
For performance critical apps, this implementation should probably not copy the 
values into a list. A better implementation would be to pass an iterator to the 
callback so the consumer can read all the values from the queue. This reduces 
allocations and is thread safe. 

Original comment by mike.ret...@gmail.com on 2 Sep 2010 at 12:23

GoogleCodeExporter commented 9 years ago
I added an the implementation and a perf test.

http://code.google.com/p/jetlang/source/detail?r=484

The perf test shows that the performance is nearly the same as the existing 
implementation at least on my dual core macbook pro.  The code needs a test 
case that demonstrates a performance difference. This is a first draft. Let me 
know if you have any suggestions.

Original comment by mike.ret...@gmail.com on 2 Sep 2010 at 1:43

GoogleCodeExporter commented 9 years ago
Well, I've make some cleanup (remove MemoryChannel, etc) in your test to see 
the raw performance of BatchSubscriber itself (in attach), and got even worst 
result -- lock-based implementation definitly outperforms my lock-free. I've 
done my tests on rather old machines --  my old MBP (Core Duo), and 3 years old 
P4 desktop, and in both cases BatchSubscriber win the race. Results is quite 
unstable, so I repeat each case several times. In case of 2 publisher threads 
performance is more or less equals, but the more publishers -- the more 
BatchSubscriber outperforms. Quite surprizing for me, but true.

I can suggest several explanations: 

First of all, as far as I know, old CPUs have rather expensive CAS 
implementation. Romors is what current Quads and Core 2 Duo is better in this 
-- but I haven't any Quads close to me to check. 

Second, is CuncurrentLinkedQueue -- it uses 1 CAS per dequeue, but 2 CAS (which 
should both succeed) for enqueue. With addition of my 1 CAS per enqueue it 
makes enqueue rather expensive - possible, more expensive then one simple 
short-time lock. 

I still think BatchSubscriber can be implemented in more efficient way using 
CAS, but seems this try is failed :) I'll take some time to think about it. 

As for using iterator over CLQ instead of newly allocated buffer -- I do not 
sure it is the right way. CLQ with it linked-list nature, with bad 
cache-locality it not a best structure to single-thread iterate. And from 
client point of view, have a plain-old array-backed List in own exclusive 
access is (simpler, clearer?) then have an iterator over some multithread 
structure with undefined size.

Original comment by chere...@gmail.com on 5 Sep 2010 at 7:42

Attachments: