lemire / SIMDCompressionAndIntersection

A C++ library to compress and intersect sorted lists of integers using SIMD instructions
Apache License 2.0
419 stars 58 forks source link

Is this will be ported to java #1

Closed betepahos closed 10 years ago

betepahos commented 10 years ago

Sorry for asking but do you plan to port this code to java? I saw your implementation of roaring bitmaps for java (and this question may be obsolete), and the readme point out that some functionality is missing in fastPForDelta (thus in java impl. too). Otherwise its very impressive stuff albeit i iterate trough the academic papers again because its somewhat complicated for me.

lemire commented 10 years ago

Thanks for your interest.

Can you explain what you mean by "the readme point out that some functionality is missing in fastPForDelta"...? Which functionality is of interest to you?

As for Java, we do have a Java port, yes... it is at

https://github.com/lemire/JavaFastPFOR

I will consider this issue closed as I think it is more of a question rather than an issue, please reopen if you disagree.

betepahos commented 10 years ago

"The FastPFOR C++ Library available at https://github.com/lemire/FastPFor implements some of the same compression schemes except that it is not optimized for the compression of sorted lists of integers." - And the question regards the intersecting functionality of compressed data used in this project

lemire commented 10 years ago

If you want to compress sorted lists of integers, use SIMDCompressionAndIntersection. Otherwise use FastPFor. There is no missing functionality... just two different libraries with different focuses.

It is not possible to port SIMD intersections in Java because there is no support for SIMD instructions in the Java language.

betepahos commented 10 years ago

yes i see.. sorry for the hustle then :)

betepahos commented 10 years ago

Is it appropariate to use the code from JNI. Will it make sense performance wise?

lemire commented 10 years ago

Oracle's JVM uses SIMD instructions internally but the Java language does not allow its users to call SIMD instructions.

You can certainly wrap a C++ library for use in Java. It would be a great project. Please let me know if you get good results.

betepahos commented 10 years ago

Hm... i did poke around and it seems lacking java/scala/JVM "calling" SIMD instructions directly is a big deal in the community and there is certain trend to make that happen :
1: http://people.apache.org/~xli/papers/npc10_java_vectorization.pdf 2: https://bugs.openjdk.java.net/browse/JDK-7192383 -this still dont make them callable by users

Supporting 512 bit vectors by AVX would broad the impact considerably

IF* i manage to get JNI working with the code will be happy to report the results. Thanks.

lemire commented 10 years ago

I am not sure I would call it a trend. There were proposals 5 years ago to add support for SIMD instructions at the language level, with prototypes, but they have not been been adopted. I do not think it is coming. C# does support SIMD instructions as of recently, but the support is too limited to do much good for the kind of work I do.

AVX does not support 512-bit vectors. Maybe you are thinking of AVX-512. Or maybe you are thinking of 256-bit vectors. Unfortunately, AVX instructions often have higher latency in current processors which hurts us.

searchivarius commented 10 years ago

Unfortunately, 256-bit extensions are only a tad faster than 128-bit extensions in many cases. Hopefully, this will change and, in particular, 512-bit extensions will be much faster.

There was some academic effort to provide a uniform access to SIMD operations from Java, but I can't find the code. The paper is here: http://queue.acm.org/detail.cfm?id=1945954

betepahos commented 10 years ago

Thanks for information and insight provided.

betepahos commented 10 years ago

just for reference if someone search too : https://github.com/tpbvieira/jpf-enabledness/tree/master/delta-i/delta-experiments/src/jSIMD

searchivarius commented 10 years ago

BTW, did you try it? If it works, hmmm, we can create codecs for Solr/Lucene.

betepahos commented 10 years ago

No just find it because jSIMD code was gone as you said this is the closest thing i find.

betepahos commented 10 years ago

@searchivarius did you check the library. It is used to blend the images. Quick scan NativeLibrary shows that it export most of operations (if not all), but i would say its little complicated for me :)

searchivarius commented 10 years ago

I had a quick look. It's quite simple, IMHO. But I think it lacks a lot of operations (I didn't find _mm_shuffle_epi8 and popcnt) and focuses mostly on operations with arrays, IMHO.

Besides, it is not exactly a thin wrapper. Look at this function: https://github.com/tpbvieira/jpf-enabledness/blob/master/delta-i/delta-experiments/src/native/NativeLibrary.c#L777

Are we going to do a check every time we need to add two 128-bit numbers? If so, this will not be very efficient.

It will be much easier to wrap the whole thing using JNI, but we don't have time and energy for this project.

betepahos commented 10 years ago

Yes on the second thought if we want to get the speed of the simd we might as well just expose only the simd functionality wrapped by JNI. Got it .