Closed CarrieX6 closed 9 months ago
This is because you haven't compiled the CUDA C codes successfully. Please follow the instructions given in ReadMe.txt to compile the codes. Since you are using CUDA 11.3, you need to change cuda-11.2 to cuda-11.3 in the commands. BTW, you need to make sure the paths are valid on your machine. Let me know if you have problems when doing this.
If you have compiled the codes without any problem, make sure you have replaced the old non_max_suppression_op.so and crop_and_resize_op_gpu.so with the new ones you get. The old ones were compiled by me and they are not suitable for your environment.
Thank you for your responding! I have compiled the codes without any problem and replaced non_max_suppression_op.so and crop_and_resize_op_gpu.so. But I still got the same problem. I also tried the non_max_suppression_op_gpu.so, but it didn't work.
From the 3rd screenshot above, the problem is several dynamic libraries, e.g., libcublas.so.11, are not loaded on your side. Please solve this problem first. Sorry that I am not sure what else can be the cause except code recompilation.
I'm also trying to fix the issue with the dynamic libraries. I've confirmed that these libraries exist but still get errors, so I'm not sure if this is related to recompilation. Anyway, I will try to work on it. Thank you again for your prompt reply. I'm appreciate your help!
Let me know if you find any other causes to the problem. Good luck! BTW, you can also try to install CUDA 11.2 instead and see if the code runs normally.
Sure, I'll try it if I cannot handle this problem. Thank you!
The cause might be related to missing files/invalid paths. I googled a little bit and found the following highly voted solution. Hope it helps. You may need to change the command slightly to adapt to your machine.
First, find out where the "libcudart.so.11.0" is If you lost other at error stack, you can replace the "libcudart.so.11.0" by your word in below:
sudo find / -name 'libcudart.so.11.0'
Output in my system. This result shows where the "libcudart.so.11.0" is in my system:
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudart.so.11.0
If the result shows nothing, please make sure you have install cuda or other staff that must install in your system.
Second, add the path to environment file.
edit /etc/profile using "sudo vim /etc/profile" append path to "LD_LIBRARY_PATH" in profile file using "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.1/targets/x86_64-linux/lib"
make environment file work using "source /etc/profile"
You can also check out: https://github.com/tensorflow/tensorflow/issues/45930 and https://stackoverflow.com/questions/70967651/could-not-load-dynamic-library-libcudart-so-11-0
Thanks a lot! I've solved this problem by updating the LD_LIBRARY_PATH. I found that recompiling adds redundant paths to LD_LIBRARY_PATH when I have two different CUDA versions(11.3 and 11.8).
The following steps work for me:
So I guess the export will lead to an incorrect link in dynamic libraries when there are several CUDA versions in the environment. In total, recompiling C files without error is required (Ensure the recompiling is correct) and the TensorFlow should be connected to GPU (Ensure the libraries are installed). Then it should be worked.
Glad to know that your problem was solved. It seems unrelated to any bugs in the code but the environment variables. I will close this issue for now.
Hi, I'm tring to reproduct your work code after recompiling the NMS and 3D Crop files with CUDA =11.3, tf = 2.5.0. Python = 3.8. However I still getting the error messages. Could you help with this problem, thanks a lot!