Closed luhenry closed 3 years ago
I've noticed in the past that f2j blas was pretty suboptimal, especially with regards to making hotspot happy. Looks like you did most of what I was able to figure out (and a bunch of stuff I didn't I'm sure).
One thing I had noticed previously was that extracting methods for the various branches (e.g. incx == 1 vs != 1) would frequently make the inliner happier, especially in "real use" since it wouldn't hit method size limits.
Yes, a bunch of the speedup comes from clearly outlining loop unrolling that hotspot can easily vectorize. A big limitation of Hotspot nowadays is that it's unable to vectorize for (...) { sum += a[i] * b[i] }
. It will vectorize the a[i] * b[i]
but it won't vectorized the sum += ...
with for (...) { sum0 += a[i+0] * b[i+0]; sum1 += a[i+1] * b[i+1]; } sum += sum0 + sum1;
This patterns is what allows to get such a huge boost with JDK16 when using the Vector API.
I'll need to update docs etc too. Looks like we can basically just delete the natives project?
Short answer yes. However, removing the whole project implies that it would break any project depending on it. Keeping it would allow not to break them, even though it would be empty and thus dead weight. I'll leave that to you to decide, just let me know.
Yeah I'll leave it as an empty jar and put a note that it's there for historical purposes. To support Scala 3 I'm gonna have to break binary compatibility in a pretty major way, which may be the time to excise it.
On Tue, May 4, 2021 at 3:37 PM Ludovic Henry @.***> wrote:
I've noticed in the past that f2j blas was pretty suboptimal, especially with regards to making hotspot happy. Looks like you did most of what I was able to figure out (and a bunch of stuff I didn't I'm sure).
One thing I had noticed previously was that extracting methods for the various branches (e.g. incx == 1 vs != 1) would frequently make the inliner happier, especially in "real use" since it wouldn't hit method size limits.
Yes, a bunch of the speedup comes from clearly outlining loop unrolling that hotspot can easily vectorize. A big limitation of Hotspot nowadays is that it's unable to vectorize for (...) { sum += a[i] b[i] }. It will vectorize the a[i] b[i] but it won't vectorized the sum += ... with for (...) { sum0 += a[i+0] b[i+0]; sum1 += a[i+1] b[i+1]; } sum += sum0 + sum1; This patterns is what allows to get such a huge boost with JDK16 when using the Vector API.
I'll need to update docs etc too. Looks like we can basically just delete the natives project?
Short answer yes. However, removing the whole project implies that it would break any project depending on it. Keeping it would allow not to break them, even though it would be empty and thus dead weight. I'll leave that to you to decide, just let me know.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/scalanlp/breeze/pull/811#issuecomment-832293045, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAACLIKFW2ALQCBUFQG2VFLTMBZKDANCNFSM44DGRUHA .
@dlwh you mentioned a future release for scala 3, but are you planning an earlier release with this change? Having a non-major release would allow Spark to be consistent on which BLAS library it's using, once https://github.com/apache/spark/pull/32415 is merged. Thank you!
Yeah I'll cut a smaller release too
On Fri, May 7, 2021, 4:47 AM Ludovic Henry @.***> wrote:
@dlwh https://github.com/dlwh you mentioned a future release for scala 3, but are you planning an earlier release with this change? Having a non-major release would allow Spark to be consistent on which BLAS library it's using, once apache/spark#32415 https://github.com/apache/spark/pull/32415 is merged. Thank you!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scalanlp/breeze/pull/811#issuecomment-834301149, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAACLIPT6NZLVAXTBR6UEEDTMPHNLANCNFSM44DGRUHA .
Following https://github.com/apache/spark/pull/32415, I would like to update Breeze to take advantage of
dev.ludovic.netlib
in place ofcom.github.fommil.netlib
.This package provides faster pure-java fallback on Java 8, 11, and 16+, and provides a JNI-based wrapper that doesn't rely on any GPL or LGPL libraries.
The performance numbers on Spark are the following:
JDK8:
JDK11:
JDK16: