Open lemire opened 7 years ago
I think PyRoaringBitMap is faster than my cffi binding because it using Cython. I recently working for a project of game analysis which using pypy and dask.distributed for distributed computing. I want to make croaring to work with them properly. I think cffi is the best way and wrote a simple wrapper for it , I will try to use PyRoaringBitMap . thank you for your attention.
It will be great If there have a distributed service of croaring just like memcached
It would be useful to have hard numbers so that users can make a choice based on real data.
@sunzhaoping I think some performance issues can easily be fixed. For instance:
range
__le__
and roaring_bitmap_is_strict_subset to implement the method __lt__
.Ping me if you fix this, I would be glad to re-run my benchmark!
Good! thank you for your suggestion.
@Ezibenroc I have optimized the code according to your suggestion.
@sunzhaoping the new benchmark results are much better:
operation | pyroaring | python-croaring |
---|---|---|
range constructor | 1.59e-04 | 1.81e-04 |
ordered list constructor | 3.24e-02 | 6.03e-02 |
list constructor | 1.23e-01 | 1.28e-01 |
ordered array constructor | 3.55e-03 | 4.57e-03 |
array constructor | 1.14e-01 | 1.15e-01 |
element addition | 8.95e-07 | 3.86e-05 |
element removal | 2.27e-06 | 1.62e-06 |
membership test | 9.69e-08 | 1.86e-06 |
union | 1.88e-04 | 2.14e-04 |
intersection | 8.85e-04 | 9.33e-04 |
difference | 1.81e-04 | 2.14e-04 |
symmetric diference | 1.85e-04 | 2.13e-04 |
equality test | 8.75e-05 | 8.05e-05 |
subset test | 1.02e-04 | 8.34e-05 |
conversion to list | 4.08e-02 | 2.77e-01 |
pickle dump & load | 4.99e-04 | 6.90e-04 |
"naive" conversion to array | 4.39e-02 | 2.91e-01 |
"optimized" conversion to array | 1.32e-03 | 3.38e-02 |
selection | 9.12e-07 | 4.42e-05 |
slice | 4.81e-02 | 3.15e-01 |
There is no more absurdly high times for python-croaring.
What is interesting however is that it is always a bit slower than pyroaring, it seems that using cffi
causes a small overhead.
I think many Type Casting in cffi is using pure python code , All pyroaring code is compiled to c code. So pyroaring can do better. If using pypy, the effect will be better.
Yes, casting is likely the issue. Going from an integer to an uint32_t
seems to involve a lot of work.
It would be interesting to benchmark against https://github.com/Ezibenroc/PyRoaringBitMap