sunzhaoping / python-croaring

A cffi python wrapper for C Roaring Bitmaps.
5 stars 2 forks source link

Benchmarking #1

Open lemire opened 7 years ago

lemire commented 7 years ago

It would be interesting to benchmark against https://github.com/Ezibenroc/PyRoaringBitMap

sunzhaoping commented 7 years ago

I think PyRoaringBitMap is faster than my cffi binding because it using Cython. I recently working for a project of game analysis which using pypy and dask.distributed for distributed computing. I want to make croaring to work with them properly. I think cffi is the best way and wrote a simple wrapper for it , I will try to use PyRoaringBitMap . thank you for your attention.

sunzhaoping commented 7 years ago

It will be great If there have a distributed service of croaring just like memcached

lemire commented 7 years ago

It would be useful to have hard numbers so that users can make a choice based on real data.

lemire commented 7 years ago

Please see https://github.com/Ezibenroc/PyRoaringBitMap/issues/22#issuecomment-326619594

Ezibenroc commented 7 years ago

@sunzhaoping I think some performance issues can easily be fixed. For instance:

Ping me if you fix this, I would be glad to re-run my benchmark!

sunzhaoping commented 7 years ago

Good! thank you for your suggestion.

sunzhaoping commented 7 years ago

@Ezibenroc I have optimized the code according to your suggestion.

Ezibenroc commented 7 years ago

@sunzhaoping the new benchmark results are much better:

operation pyroaring python-croaring
range constructor 1.59e-04 1.81e-04
ordered list constructor 3.24e-02 6.03e-02
list constructor 1.23e-01 1.28e-01
ordered array constructor 3.55e-03 4.57e-03
array constructor 1.14e-01 1.15e-01
element addition 8.95e-07 3.86e-05
element removal 2.27e-06 1.62e-06
membership test 9.69e-08 1.86e-06
union 1.88e-04 2.14e-04
intersection 8.85e-04 9.33e-04
difference 1.81e-04 2.14e-04
symmetric diference 1.85e-04 2.13e-04
equality test 8.75e-05 8.05e-05
subset test 1.02e-04 8.34e-05
conversion to list 4.08e-02 2.77e-01
pickle dump & load 4.99e-04 6.90e-04
"naive" conversion to array 4.39e-02 2.91e-01
"optimized" conversion to array 1.32e-03 3.38e-02
selection 9.12e-07 4.42e-05
slice 4.81e-02 3.15e-01

There is no more absurdly high times for python-croaring. What is interesting however is that it is always a bit slower than pyroaring, it seems that using cffi causes a small overhead.

sunzhaoping commented 7 years ago

I think many Type Casting in cffi is using pure python code , All pyroaring code is compiled to c code. So pyroaring can do better. If using pypy, the effect will be better.

lemire commented 7 years ago

Yes, casting is likely the issue. Going from an integer to an uint32_t seems to involve a lot of work.