h2oai / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Flink and DataFlow
Other
44 stars 26 forks source link

nccl2 needs libnccl_static.a but not found for ppc build #49

Closed pseudotensor closed 6 years ago

pseudotensor commented 6 years ago

http://mr-0xc1:8080/blue/organizations/jenkins/h2o4gpu-ppc64le-cuda9/detail/PR-665/3/pipeline


Scanning dependencies of target gpuxgboost

[ 95%] Linking CXX static library libgpuxgboost.a

[ 95%] Built target gpuxgboost

Scanning dependencies of target runxgboost

[ 97%] Building CXX object CMakeFiles/runxgboost.dir/src/cli_main.cc.o

[ 98%] Linking CXX executable ../xgboost

/usr/bin/ld: skipping incompatible /usr/lib/gcc/ppc64le-redhat-linux/4.8.5/../../../libnccl_static.a when searching for -lnccl_static

/usr/bin/ld: skipping incompatible /lib/libnccl_static.a when searching for -lnccl_static

/usr/bin/ld: skipping incompatible /usr/lib/libnccl_static.a when searching for -lnccl_static

/usr/bin/ld: cannot find -lnccl_static

collect2: error: ld returned 1 exit status

make[4]: *** [../xgboost] Error 1

make[3]: *** [CMakeFiles/runxgboost.dir/all] Error 2

make[2]: *** [all] Error 2

make[1]: *** [libxgboostp2nccl] Error 2

make: *** [xgboost] Error 2

script returned exit code 2
pseudotensor commented 6 years ago

non-ppc builds build, but I don't have that library either in /usr/local/cuda

x86 does have: libnccl.so.2 -> libnccl.so.2.2.13

So maybe it's trying to fall back to nccl as static because it can't find the shared library either.

Maybe the cuda installed for building the ppc is the wrong one?

pseudotensor commented 6 years ago

ah, ok, the docker downloads nccl2 for x86. I'll fix.