rocmarchive / realcaffe2

The repo is obsolete. Use at your own risk.
https://github.com/pytorch/pytorch
Apache License 2.0
12 stars 2 forks source link

caffe2 utility binary loading shared libraries libcaffe2_hip.so #63

Closed petrex closed 6 years ago

petrex commented 6 years ago

@ashishfarmer For hip binary tests we need libcaffe2_hip.so.
How about other utility binaries?Is it necessary? pls share your insight. thx.

pyeh@rocm-miopen /usr/local/bin $ ./make_cifar_db                                                                                                               
./make_cifar_db: error while loading shared libraries: ../../lib/libcaffe2_hip.so: cannot open shared object file: No such file or directory
ashishfarmer commented 6 years ago

A couple of things I see here at the first glance: I do not see an obvious connection between this binary and GPU/HIP functions. But it may be the case that caffe2 would need HIP to initialize registry in the init phase Another thing I see here is the wrong path in the linking of the library. The library is at the location ../lib/libcaffe2_hip.so rather than ../../lib/libcaffe2_hip.so. Probably needs a fix in the cmake files.

My initial thoughts, need to investigate more

djygithub commented 6 years ago

Having a similar issue, fresh ubuntu 16.04.04 LTS install, installed ROCm and caffe2 via https://github.com/ROCmSoftwarePlatform/rocm-caffe2 on a ryzen/rx560. Copied the libcaffe2_hip.so file to workaround the original issue, testing the programs in /root/rocm-caffe2/build/bin, getting the following on some of the tests:

C++ exception with description "No device code available for function: _ZN6caffe24math12_GLOBAL__N_19SetKernelIfEEviTPS3" thrown in the test body.

Any ideas where to find the missing function? I am using the following docker image:

rocm/caffe2 rocm1.8.0-develop-v1

Thanks

ashishfarmer commented 6 years ago

@djygithub , could you please rebuild and make sure there are no errors? From the error message, it looks like the HIP kernels were not linked well into libCaffe2_hip.so. In the make log right at the place it is linking libCaffe2_hip.so, do you see any silent error messages that did not kill the make process?

Also, what version of ROCm do you have on your bare metal?

djygithub commented 6 years ago

Hello thanks for the quick response. I am having issues building caffe2, and was trying the docker in hopes of avoiding the build. The ROCm is current, the machine is at home, I will check the exact details:

ii rocm-dkms 1.8.151 amd64 Radeon Open Compute (ROCm) Runtime software stack

Thanks again, I will try building later today.

djygithub commented 6 years ago

Hello,

"The docker that you are using has the kernels compiled for gfx900 target. That is why it did not work for the 560 card." The RX560 is a Polaris gfx803 I believe, thanks for the tip. Tried again with a Frontier card, works just fine. Thanks again.

petrex commented 6 years ago

https://github.com/pytorch/pytorch