jblas-project / jblas

Linear Algebra for Java
http://jblas.org
BSD 3-Clause "New" or "Revised" License
590 stars 149 forks source link

Build native libraries for Mac OS X and Windows on aarch64 machines #135

Open jiaminglu opened 1 year ago

jiaminglu commented 1 year ago

Build script modified to support build on windows/mac aarch64 platform.

Binary generated on the following environments.

Windows

Build environment: msys2 Toolchain: mingw-w64-clang-aarch64-toolchain 14.0.0 Fortran compiler: mingw-w64-clang-aarch64-flang 14.0.4 BLAS library: mingw-w64-clang-aarch64-openblas64 0.3.20-3

Mac OS X

C compiler: Apple clang version 13.1.6 Fortran compiler: MacPorts gcc12 12.1.0_6+stdlib_flag BLAS library

On mac, a dependency of libquadmath is added, which is required when compiling by gfortran from MacPorts' gcc12 package

AlbanSeurat commented 1 year ago

Any change to have this PR merged ?

mikiobraun commented 1 year ago

Hey there, sorry, saw this just yet. I'll have a look over the next few days. Thanks for the PR!

AlbanSeurat commented 1 year ago

If it is of any support, I am using a snapshot version of the branch on my computer since one month without any problems

AlbanSeurat commented 1 year ago

Any news on this PR ?

mikiobraun commented 1 year ago

@AlbanSeurat you said you're using this on your machine, is it for MacOS or Windows?

AlbanSeurat commented 1 year ago

MacOs Ventura 13 with m2max using a snapshot to be sure to have the arm build :)

mikiobraun commented 1 year ago

Hey Alban, alright, thanks for the info. Slowly getting back to having to look into this.

I'll have to make sure it also runs on all the other systems, and often I find out that I need to bump versions and so on, so it takes a bit of time, but thanks for your patience!

Looking forward to see how well this performs on Apple Silicon!

Given how old the project is, this has been quite a ride, every new processor generation led to a big bump in performance for jblas :)

mikiobraun commented 1 year ago

OK, the config part for windows looks OK, but I've had a hard time compiling the dynamic libraries for Mac OS, I need to look more into this.

One last thing, as a measure of precaution, I'm always a bit hesitant to merge binaries from external sources. @jiaminglu can I have a word with you to check a few things?

mikiobraun commented 1 year ago

Meanwhile, @AlbanSeurat, I just pushed aarch64 versions to main, if you want, can you check it out, build the jar and see whether it works for you?

AlbanSeurat commented 1 year ago

[ERROR] Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0 s <<< FAILURE! - in org.jblas.TestBlasFloat [ERROR] org.jblas.TestBlasFloat.testSYEV Time elapsed: 0 s <<< FAILURE! java.lang.AssertionError at org.junit.Assert.fail(Assert.java:87) at org.junit.Assert.assertTrue(Assert.java:42) at org.junit.Assert.assertTrue(Assert.java:53) at org.jblas.TestBlasFloat.testSYEV(TestBlasFloat.java:192) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:316) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:240) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:214) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:155) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:385) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:162) at org.apache.maven.surefire.booter.ForkedBooter.run(ForkedBooter.java:507) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:495)

[ERROR] Failures: [ERROR] TestBlasDouble.testSYEV:192 [ERROR] TestBlasFloat.testSYEV:192

Values from the assert B.data =[0.482044, 0.707107, 0.517333; -0.731620, 0.000000, 0.681713; 0.482044, -0.707107, 0.517333] rather than : assertTrue(arraysEqual(B.data, -0.48204393949466345, 0.731619628490741, -0.482043939494664, -0.7071067811865474, 1.3877787807814457E-16, 0.707106781186547, 0.5173332005549852, 0.6817130768931094, 0.5173332005549856));

For some reason, this doesn't pass.

Running the same tests from jiaminglu:main works

mikiobraun commented 1 year ago

Thank you! Can you mvn package -Dmaven.test.skip and then print what java -jar target/jblas… and so on prints. There should be a message what linking error or the like happened.

AlbanSeurat commented 1 year ago

Here is the result of java -jar target/jblas-1.2.6-SNAPSHOT.jar
-- org.jblas INFO jblas version is 1.2.4 Simple benchmark for jblas

Running sanity benchmarks.

checking vector addition... ok -- org.jblas CONFIG BLAS native library not found in path. Copying native library from the archive. Consider installing the library somewhere in the path (for Windows: PATH, for Linux: LD_LIBRARY_PATH). -- org.jblas CONFIG ArchFlavor native library not found in path. Copying native library libjblas_arch_flavor from the archive. Consider installing the library somewhere in the path (for Windows: PATH, for Linux: LD_LIBRARY_PATH). -- org.jblas CONFIG Replaced .dylib with .jnilib -- org.jblas CONFIG Loading libjblas_arch_flavor.jnilib from /lib/static/Mac OS X/aarch64/, copying to libjblas_arch_flavor.dylib. -- org.jblas CONFIG Replaced .dylib with .jnilib -- org.jblas CONFIG Loading libjblas.jnilib from /lib/static/Mac OS X/aarch64/, copying to libjblas.dylib. checking matrix multiplication... ok checking existence of dsyev...... ok [-0.210656, -0.640445, 0.656727; -0.509085, -0.116445, 0.154634; -0.807515, 0.407556, -0.077317; 0.210656, 0.640445, 0.734044] [17.233688; 1.414214; 0.000000] [-0.470605, 0.782218, -0.408248; -0.571449, 0.082339, 0.816497; -0.672293, -0.617540, -0.408248] [17.233688; 1.414214; 0.000000] checking existence of dgesvd...... ok Checking complex return values... (z = -21.0 + 88.0i) Check whether we're catching XERBLA errors. If you see something like "** On entry to DGEMM parameter number 4 had an illegal value", it didn't work! checking XERBLA... ok Sanity checks passed.

Each benchmark will take about 5 seconds...

Running benchmark "Java matrix multiplication, double precision". n = 10 : 4.408 GFLOPS (11019129 iterations in 5.0 seconds) n = 100 : 6.522 GFLOPS (16304 iterations in 5.0 seconds) n = 1000 : 6.580 GFLOPS (17 iterations in 5.2 seconds)

Running benchmark "Java matrix multiplication, single precision". n = 10 : 4.056 GFLOPS (10140989 iterations in 5.0 seconds) n = 100 : 6.676 GFLOPS (16690 iterations in 5.0 seconds) n = 1000 : 6.843 GFLOPS (18 iterations in 5.3 seconds)

Running benchmark "native matrix multiplication, double precision". n = 10 : 2.215 GFLOPS (5537755 iterations in 5.0 seconds) n = 100 : 1.726 GFLOPS (4316 iterations in 5.0 seconds) n = 1000 : 128.587 GFLOPS (322 iterations in 5.0 seconds)

Running benchmark "native matrix multiplication, single precision". n = 10 : 2.037 GFLOPS (5092037 iterations in 5.0 seconds) n = 100 : 2.252 GFLOPS (5631 iterations in 5.0 seconds) n = 1000 : 230.175 GFLOPS (576 iterations in 5.0 seconds) -- org.jblas INFO Deleting /var/folders/j3/032h1nq15lz5gxb33sl7ptkc0000gn/T/jblas15481512115687871286/libjblas.dylib -- org.jblas INFO Deleting /var/folders/j3/032h1nq15lz5gxb33sl7ptkc0000gn/T/jblas15481512115687871286/libjblas_arch_flavor.dylib -- org.jblas INFO Deleting /var/folders/j3/032h1nq15lz5gxb33sl7ptkc0000gn/T/jblas15481512115687871286

mikiobraun commented 1 year ago

Interesting, that seems to work 🤔

OK, I‘ll have to look into it further. Thanks for checking, Alban!

AlbanSeurat commented 1 year ago

If it is of any help, md5sum from jiaminglu:main repository

md5sum src/main/resources/lib/static/Mac\ OS\ X/aarch64/libjblas_arch_flavor.jnilib
21fa854850a0fbbc0c7777e774767fd0 src/main/resources/lib/static/Mac OS X/aarch64/libjblas_arch_flavor.jnilib md5sum src/main/resources/lib/static/Mac\ OS\ X/aarch64/libjblas.jnilib
f8f2438e94e2a930107b5c50844c8807 src/main/resources/lib/static/Mac OS X/aarch64/libjblas.jnilib

md5sum from your build : md5sum src/main/resources/lib/static/Mac\ OS\ X/aarch64/libjblas_arch_flavor.jnilib
cb23f397e63dd5bfda0f42b7ac527877 src/main/resources/lib/static/Mac OS X/aarch64/libjblas_arch_flavor.jnilib md5sum src/main/resources/lib/static/Mac\ OS\ X/aarch64/libjblas.jnilib
863e5097e9ed10486619a0f8fe03984f src/main/resources/lib/static/Mac OS X/aarch64/libjblas.jnilib

mikiobraun commented 1 year ago

Yeah, file sizes are also massively different. Weird.

jiaminglu commented 1 year ago

Is it going well? It seems you’ve successfully built your binary and the size is different. But I believe if it passes your test it will be ok.

Sent with Spark 2023年3月25日 +0800 06:00 Mikio L. Braun @.***>,写道:

Yeah, file sizes are also massively different. Weird. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

AlbanSeurat commented 1 year ago

@jiaminglu would it be possible to have the bias library you are using on mac aarch64 and the command line arguments you are using for your configure script ?

I tried to build the current project and it failed with java 8 (rosetta make the build think it is a x86 rather than an aarch64).

mikiobraun commented 1 year ago

It turns out that it depends whether you have an Intel or a ARM version of Java running. What I did was to install openjdk with brew and then use that, it should give you the right architecture. All of this should be documented somewhere... 😅

For reproducing configurations, everything ends up being written to configure.out, so if you copy that you should be able to run the same. I'd also be interested in the configure.out, @jiaminglu. For some reason, I had to manually add more static libs besides libquadmath... .

Not a good situation. jblas is doing this so that the resulting shared lib has no external dependencies, but this ends up having low level dependencies on gcc and so on... . Without it, people need to install gfortran to run jblas, which is also not great... .

I've been wanting to overhaul the configure scripts for years now, very brittle and error prone...

AlbanSeurat commented 1 year ago

I used temurin8 with brew but apparently they does not provide an aarch64 build, only a x86 and therefore rosetta 2 kick-in. Using temurin11, the native build failed because javah is not shipped anymore with the jdk (since java 10).

I ready to change the script and document them if needed but for the same reason as you, I wanted to ease the work and start from @jiamlinglu build.

mikiobraun commented 1 year ago

Yeah, that javah bug is another thing I wanted to fix for years :)

javac now supports a -h flag where it will output header files, but I think it will then also compile everything already.

The whole build process with all those generated files is very complex. I was young and foolish, I guess :)

jiaminglu commented 1 year ago

I am using MacPorts as the package manager because it supports my old Mac and better supports multiple architecture.

Here is my configure.out:

BUILD_TYPE=openblas
CC=gcc
CCC=c99
CFLAGS=-fPIC
F77=gfortran
FOUND_JAVA=true
FOUND_NM=true
INCDIRS=-Iinclude -I/Library/Java/JavaVirtualMachines/openjdk17-graalvm/Contents/Home//include -I/Library/Java/JavaVirtualMachines/openjdk17-graalvm/Contents/Home//include/darwin
JAVA_HOME=/Library/Java/JavaVirtualMachines/openjdk17-graalvm/Contents/Home/
LAPACK_HOME=./lapack-lite-3.1.1
LD=gcc
LDFLAGS=-shared
LIB=lib
LINKAGE_TYPE=static
LOADLIBES=/opt/local/lib/libopenblas.a /opt/local/lib/gcc12/gcc/arm64-apple-darwin21/12.2.0/../../../libgfortran.a /opt/local/lib/gcc12/gcc/arm64-apple-darwin21/12.2.0/../../../libquadmath.a
MAKE=make
NM=nm
OS_ARCH=aarch64
OS_ARCH_WITH_FLAVOR=aarch64
OS_NAME=Mac\ OS\ X
RUBY=ruby
SO=jnilib

And here is the toolchain versions, and everything is in arm64:

gcc12 @12.2.0_0+stdlib_flag
OpenBLAS @0.3.21_1+gcc12+lapack
openjdk17-graalvm @22.2.0_0

But since it has been long, I am not sure whether the package got updated after the binary in this pr compiled. 在 2023年3月27日 +0800 PM7:23,Mikio L. Braun @.***>,写道:

It turns out that it depends whether you have an Intel or a ARM version of Java running. What I did was to install openjdk with brew and then use that, it should give you the right architecture. All of this should be documented somewhere... 😅 For reproducing configurations, everything ends up being written to configure.out, so if you copy that you should be able to run the same. I'd also be interested in the configure.out, @jiaminglu. For some reason, I had to manually add more static libs besides libquadmath... . Not a good situation. jblas is doing this so that the resulting shared lib has no external dependencies, but this ends up having low level dependencies on gcc and so on... . Without it, people need to install gfortran to run jblas, which is also not great... . I've been wanting to overhaul the configure scripts for years now, very brittle and error prone... — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

jiaminglu commented 1 year ago

For the dependencies I believe it will be OK if it links statically.

But what libraries have you added other than libquadmath? That’s really strange. Actually I did not expect to see libquadmath either, since it’s not needed in x86 builds. I just don’t want to spend time on this minor issue.

Also, I don’t suggest to use x86_64 version of compilers with rosetta2 to build anything targeting arm. It may break things with build scripts not carefully written, link against static libraries of wrong architecture for example. 在 2023年3月27日 +0800 PM7:23,Mikio L. Braun @.***>,写道:

It turns out that it depends whether you have an Intel or a ARM version of Java running. What I did was to install openjdk with brew and then use that, it should give you the right architecture. All of this should be documented somewhere... 😅 For reproducing configurations, everything ends up being written to configure.out, so if you copy that you should be able to run the same. I'd also be interested in the configure.out, @jiaminglu. For some reason, I had to manually add more static libs besides libquadmath... . Not a good situation. jblas is doing this so that the resulting shared lib has no external dependencies, but this ends up having low level dependencies on gcc and so on... . Without it, people need to install gfortran to run jblas, which is also not great... . I've been wanting to overhaul the configure scripts for years now, very brittle and error prone... — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

AlbanSeurat commented 1 year ago

my new configure.out (after changing build.xml to use javac task)

BUILD_TYPE=openblas CC=aarch64-apple-darwin22-gcc-12 CCC=aarch64-apple-darwin22-gcc-12 CFLAGS=-fPIC F77=gfortran FOUND_JAVA=true FOUND_NM=true INCDIRS=-Iinclude -I/Library/Java/JavaVirtualMachines/temurin-17.jdk/Contents/Home/include -I/Library/Java/JavaVirtualMachines/temurin-17.jdk/Contents/Home/include/darwin JAVA_HOME=/Library/Java/JavaVirtualMachines/temurin-17.jdk/Contents/Home LAPACK_HOME=./lapack-lite-3.1.1 LD=aarch64-apple-darwin22-gcc-12 LDFLAGS=-shared LIB=lib LINKAGE_TYPE=static LOADLIBES=/opt/homebrew/Cellar/openblas/0.3.22/lib/libopenblas.a /opt/homebrew/Cellar/gcc/12.2.0/bin/../lib/gcc/current/gcc/aarch64-apple-darwin22/12/../../../libgfortran.a /opt/homebrew/Cellar/gcc/12.2.0/bin/../lib/gcc/current/gcc/aarch64-apple-darwin22/12/../../../libquadmath.a /opt/homebrew/Cellar/gcc/12.2.0/bin/../lib/gcc/current/gcc/aarch64-apple-darwin22/12/../../../libgomp.a MAKE=make NM=nm OS_ARCH=aarch64 OS_ARCH_WITH_FLAVOR=aarch64 OS_NAME=Mac\ OS\ X RUBY=ruby SO=jnilib

For some reason my version of gfortran need openMP for some calculs (and therefore I need to have libgomp).

my configuration command line : ./configure --build-type=openblas --download-lapack --lapack=./lapack-lite-3.1.1 --libpath=/opt/homebrew/Cellar/openblas/0.3.22/lib --static-libs

Still having the error when passing test ...

(I try not to use clang -> there is a wrapper of gcc)