lessthanoptimal / ejml

A fast and easy to use linear algebra library written in Java for dense, sparse, real, and complex matrices.
https://ejml.org
555 stars 116 forks source link

Adding SIMD / New Vector API that's in incubation #142

Open eix128 opened 3 years ago

eix128 commented 3 years ago

Hi , Java 16 has released Vector API

You can look at the links for details: https://metebalci.com/blog/what-is-new-in-java-16/ https://openjdk.java.net/jeps/338

Java 16's SIMD API has intrinsic capability. Much faster then JNI.That directly converts these method calls to ARM NEON or AVX512 etc..

It will be good to fit ejml to new Java 16's Vector API

Also checkout for TornadoVM for very big matrix FPGA solutions

lessthanoptimal commented 3 years ago

I've been looking into this and it's definitely in the "plan" and early benchmarks look good. I'll need to do some redesigning so that you can swap out algorithms easily. EJML will always be stuck on ancient JDK's so this will need to go into a seperate module that has a different build path.

lessthanoptimal commented 1 year ago

Posting an update, but not much of one. Still very much something I would like to add but can't prioritize it at the moment. If anyone wants to give it a shot go here and we can work out integration details.

https://github.com/lessthanoptimal/VectorPerformance

ennerf commented 1 year ago

@lessthanoptimal I recently did some tests with Aparapi and think that could be useful for large matrices as well. It converts bytecode to OpenCL and runs algorithms on the GPU. It's fairly easy to work with and backwards compatible with old versions.

There are some limitations like only being able to use static methods and primitive/array types, but EJML is set up that way anyways. Here is a small sample I was working with Mandelbrot GPU.

lessthanoptimal commented 1 year ago

@ennerf How much of a speed up were you seeing?

ennerf commented 1 year ago

I think my GTX 2060 was about 20-50% faster than the parallel version on 12/24 threads, but I didn't do any real benchmarks and the dataset wasn't very large. The benchmarks in this blog post look like it scales well for larger problems.

lessthanoptimal commented 1 year ago

@ennerf Those articles are interesting. I also had no idea there was an active community writing renders for JavaFX. When I last tried it years ago JavaFX's 3D performance was really bad and I didn't see anyway to get a custom solution running in that framework.

ennerf commented 1 year ago

@lessthanoptimal going a bit off-topic here, I think that the JavaFX 3D performance is actually pretty good as long as you stay within the supported parts (e.g. dynamic CubeWorld or rendering robots). It's also cool that the same code can run on Android and iOS.

Where things get tricky is when you need to render long lines or large dynamic objects like pointclouds.