intel-analytics / analytics-zoo

Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
https://analytics-zoo.readthedocs.io/
Apache License 2.0
11 stars 3 forks source link

MacOS libbigquant_avx512.dylib failed to be loaded #16

Closed suizman closed 2 years ago

suizman commented 2 years ago

I'm trying to run an inference using BigDL and I'm getting the following error:

Issue:

/var/folders/sk/vx8y4czd0ll7fmcv4z7441fw0000gp/T/bigquant.native.307074527125639494/libbigquant_avx512.dylib failed to be loaded.
2021-11-17 11:24:21 INFO  SparkContext:54 - Invoking stop() from shutdown hook
2021-11-17 11:24:21 INFO  AbstractConnector:318 - Stopped Spark@60b485aa{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2021-11-17 11:24:21 INFO  SparkUI:54 - Stopped Spark web UI at http://192.168.0.103:4040
2021-11-17 11:24:22 INFO  MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!
2021-11-17 11:24:22 INFO  MemoryStore:54 - MemoryStore cleared
2021-11-17 11:24:22 INFO  BlockManager:54 - BlockManager stopped
2021-11-17 11:24:22 INFO  BlockManagerMaster:54 - BlockManagerMaster stopped
2021-11-17 11:24:22 INFO  OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
2021-11-17 11:24:22 INFO  SparkContext:54 - Successfully stopped SparkContext
2021-11-17 11:24:22 INFO  ShutdownHookManager:54 - Shutdown hook called
2021-11-17 11:24:22 INFO  ShutdownHookManager:54 - Deleting directory /private/var/folders/sk/vx8y4czd0ll7fmcv4z7441fw0000gp/T/spark-1a380e9f-9db0-433f-9d3d-cd6a2a13a1a2

System specs:

CPU: Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz OS: MacOS Big Sur 11.6.1

It seams that library is trying to use avx512 library but it isn't compiled for MacOS, only for Linux? Is there a chance to use mkldnn/avx512 on MacOS or force mklblass using some special flag before executing the process? I've tried -Dbigdl.engineType=mklblas but it does not work..

Thanks

qiyuangong commented 2 years ago

Hi @suizman Thank you for creating this issue! :)

This error indicates that your application crashed when trying to load libbigquant_avx512.dylib. Seems there is something wrong with our jar (This file should be packaged into our jar, and loaded during runtime). I will try to reproduce this error. Can you share more details about your workload/example, bigdl model/version and Spark configurations? It is not recommended to use bigquant in current stage.

No. It is complied for macOS (dylib is for macOS only. But it crashed during loading). We can use AVX-512 feature on both Linux and MacOS if CPU has this feature. Your CPU has AVX-512 ( https://www.intel.com/content/www/us/en/products/sku/196593/intel-core-i71068ng7-processor-8m-cache-up-to-4-10-ghz/specifications.html), so it should work for you.

Yes. By default BigDL will try to enable AVX-512 support with mklblas (by default) and mkldnn (called oneDNN now. Need flag to enable) .These 2 libs are different but both have AVX-512 support. If you want to enable mkldnn, just add this java conf -Dbigdl.engineType=mkldnn -Dbigdl.mklNumThreads=8.

Last but not least, we are merging this repo (Analytics-Zoo) to BigDL (2.0 branch). After we find the root cause, we encourage you to migrate to BigDL 2.0. :)

qiuxin2012 commented 2 years ago

Do you call any quantize() method in your application?

qiyuangong commented 2 years ago

libbigquant_avx512.dylib can not be found is fixed by https://github.com/intel-analytics/BigDL-core/pull/140 .

suizman commented 2 years ago

Hi @qiyuangong, thanks for your response.

These are my actual versions with Scala 2.11.12: analytics-zoo-bigdl_0.12.2-spark_2.4.3 -> 0.10.0 zoo-core-mkl-mac -> 0.10.0 spark-mllib -> 2.4.3 spark-sql -> 2.4.3

Which version of BigDL I should use? Should I compile this intel-analytics/BigDL-core#140 last version with the fix?

Yes @qiuxin2012, I'm using the quantize() method

Thanks

qiyuangong commented 2 years ago

Hi @qiyuangong, thanks for your response.

These are my actual versions with Scala 2.11.12: analytics-zoo-bigdl_0.12.2-spark_2.4.3 -> 0.10.0 zoo-core-mkl-mac -> 0.10.0 spark-mllib -> 2.4.3 spark-sql -> 2.4.3

Which version of BigDL I should use? Should I compile this intel-analytics/BigDL-core#140 last version with the fix?

Yes @qiuxin2012, I'm using the quantize() method

Thanks

Known Solution for this issue:

  1. Avoid using quantize method. Model training & inference will be accelerated by AVX-512 even without quantize, e.g., accelerate int8 model. I think you can try this path first. :)
  2. Download latest NB (nightly-build) BigDL-2.0 (version 0.14.0-SNAPSHOT) or Analytics-Zoo (master branch, bigdl==0.14.0.dev3). You don't have to compile BigDL-Core. We will provided latest jar with that fix.

Have a nice day! Qiyuan

qiyuangong commented 2 years ago

Hi @suizman We have updated latest jar&version for Analytics-Zoo and BigDL 2.0. You can build from source with latest Analytics-Zoo(master): bash zoo/make-dist.sh BigDL (Branch-2.0): bash scala/make-dist.sh

Will update download link for jars later Analytics-Zoo (https://oss.sonatype.org/content/repositories/snapshots/com/intel/analytics/zoo/analytics-zoo-bigdl_0.14.0-SNAPSHOT-spark_2.4.6/) BigDL 2.0 (https://oss.sonatype.org/content/repositories/snapshots/com/intel/analytics/bigdl/dist-spark-2.4.6-scala-2.11.8-all/0.13.1-SNAPSHOT/)

Have a nice day! Qiyuan

suizman commented 2 years ago

Many thanks @qiyuangong, I'll try with that versions

suizman commented 2 years ago

Hi @qiyuangong

Now I'm getting the following error:

Exception in thread "main" java.lang.Error: Can't find the library libjdnn.dylib in the resource folder.
    at com.intel.analytics.bigdl.mkl.Loader.resource(Loader.java:89)
    at com.intel.analytics.bigdl.mkl.Loader.copyAll(Loader.java:80)
    at com.intel.analytics.bigdl.mkl.Loader.init(Loader.java:52)
    at com.intel.analytics.bigdl.mkl.MklDnn.<clinit>(MklDnn.java:27)
    at com.intel.analytics.bigdl.mkl.hardware.Affinity.<clinit>(Affinity.java:16)
    at com.intel.analytics.bigdl.utils.Engine$.setMklDnnEnvironments(Engine.scala:604)
    at com.intel.analytics.bigdl.utils.Engine$.<init>(Engine.scala:55)
    at com.intel.analytics.bigdl.utils.Engine$.<clinit>(Engine.scala)
    at com.intel.analytics.bigdl.nn.abstractnn.AbstractModule.<init>(AbstractModule.scala:1045)
    at com.intel.analytics.bigdl.nn.Container.<init>(Container.scala:42)
    at com.intel.analytics.bigdl.nn.keras.KerasLayer.<init>(KerasLayer.scala:168)
    at com.intel.analytics.bigdl.nn.keras.SoftMax.<init>(SoftMax.scala:32)
    at com.intel.analytics.bigdl.nn.keras.SoftMax$.apply$mFc$sp(SoftMax.scala:68)
    at SoftMaxTest$.<init>(SoftMaxTest.scala:17)
    at SoftMaxTest$.<clinit>(SoftMaxTest.scala)
    at Main$.main(Main.scala:11)
    at Main.main(Main.scala)

It seams that this library is also missing in the latest nightly.

qiyuangong commented 2 years ago

Hi @suizman Unfortunately, according to https://github.com/intel-analytics/analytics-zoo-internal/issues/875 libjdnn is not supported on macOS. Please avoid using mkldnn and quantize on macOS.

We added a warning for this issue (https://github.com/intel-analytics/BigDL/pull/3649).

suizman commented 2 years ago

Ok, thanks. We can close this issue.

qiyuangong commented 2 years ago

Thank you @suizman ! Issue closed. 👍

Have a nice day! Qiyuan