Running compiled pyflex under cuda-11.1 failed, undefined symbol: cudaSetupArgument

julyfun commented 1 year ago

Environment

uname -a result

Linux julyfun-Lenovo-XiaoXinAir-14IIL-2020 5.19.0-46-generic #47~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Jun 21 15:35:31 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

GPU

nvidia MX-350, cuda version 12.2 (I was building pyflex in docker so it shouldn't matter)

Error

In #5, I modified the Dockerfile FROM command into nvidia/cuda:11.1.1-devel-ubuntu18.04 and ran . ./prepare.sh && ./compile.sh successfully. But when I was trying to run the demo(python ... command in README), it failed with:

Traceback (most recent call last):
  File "/home/julyfun/Documents/GitHub/julyfun/cloth-funnels-test/cloth_funnels/run_sim.py", line 1, in <module>
    from cloth_funnels.utils.utils import (
  File "/home/julyfun/Documents/GitHub/julyfun/cloth-funnels-test/cloth_funnels/utils/utils.py", line 3, in <module>
    from cloth_funnels.environment import SimEnv
  File "/home/julyfun/Documents/GitHub/julyfun/cloth-funnels-test/cloth_funnels/environment/__init__.py", line 1, in <module>
    from .simEnv import SimEnv
  File "/home/julyfun/Documents/GitHub/julyfun/cloth-funnels-test/cloth_funnels/environment/simEnv.py", line 9, in <module>
    from cloth_funnels.utils.env_utils import (
  File "/home/julyfun/Documents/GitHub/julyfun/cloth-funnels-test/cloth_funnels/utils/env_utils.py", line 8, in <module>
    import pyflex
ImportError: /home/julyfun/Documents/GitHub/julyfun/cloth-funnels-test/cloth_funnels/PyFlex/bindings/build/pyflex.cpython-39-x86_64-linux-gnu.so: undefined symbol: cudaSetupArgument

That could be because cudaSetupArgument is deprecated since a newer version than cuda9.2.

Rudy112 commented 1 year ago

Hi @julyfun , I also have the same problem. After changing the Docker file and successfully compilation, the pyflex does not work because of the cudaSetupArgument issue. Have you eventually solved this?

julyfun commented 1 year ago

Hi @julyfun , I also have the same problem. After changing the Docker file and successfully compilation, the pyflex does not work because of the cudaSetupArgument issue. Have you eventually solved this?

you can try pulling docker pull yunzhuli/pyflex_16_04_cuda_9_1 (I this got from a README in pyflex official repo), this cuda version is old enough where cudaSetupArgument is not deprecated...

zcswdt commented 11 months ago

Hi @julyfun , I also have the same problem. After changing the Docker file and successfully compilation, the pyflex does not work because of the cudaSetupArgument issue. Have you eventually solved this?

Hello, I’m asking for paid help. Have you successfully run through the author’s training code?

julyfun commented 11 months ago

Hi @julyfun , I also have the same problem. After changing the Docker file and successfully compilation, the pyflex does not work because of the cudaSetupArgument issue. Have you eventually solved this?

Hello, I’m asking for paid help. Have you successfully run through the author’s training code?

yes. I ran it under Ubuntu 16.04 GTX 1080Ti, cuda 9.2, python==3.7.16, pytorch==1.7.1+cu92, pytorch-lightning==1.5

zcswdt commented 11 months ago

Wow, thank you very much for your reply, can you guide me? I currently encounter a problem, that is, the memory will be eaten during the training process. After helping me set up the environment, I will pay you for your hard work. Thank you very much.

At 2023-11-20 20:37:04, "Junjie Fang" @.***> wrote:

Hi @julyfun , I also have the same problem. After changing the Docker file and successfully compilation, the pyflex does not work because of the cudaSetupArgument issue. Have you eventually solved this?

Hello, I’m asking for paid help. Have you successfully run through the author’s training code?

yes. I ran it under Ubuntu 16.04 GTX 1080Ti, cuda 9.2, python==3.7.16, pytorch==1.7.1+cu92, pytorch-lightning==1.5

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

zcswdt commented 11 months ago

Hi @julyfun , I also have the same problem. After changing the Docker file and successfully compilation, the pyflex does not work because of the cudaSetupArgument issue. Have you eventually solved this?

Hello, I’m asking for paid help. Have you successfully run through the author’s training code?

yes. I ran it under Ubuntu 16.04 GTX 1080Ti, cuda 9.2, python==3.7.16, pytorch==1.7.1+cu92, pytorch-lightning==1.5

Hello, can you add your qq? My qq number is 810190882. Please add it if you are kind and I will describe to you the problems I encountered. By the way, I'll give you a tip to say thank you.

zcswdt commented 11 months ago

cuda 9.2, python==3.7.16, pytorch==1.7.1+cu92, pytorch-lightning==1.5

Hello, sorry to bother you. I would like to ask you a few questions. I installed the environment on the ubuntu18 system exactly according to the author's tutorial. After the installation, the version supported by torch is 11.7, and the cuda version of my local nvcc -V is 10.0. I can also run through the author's test and training code, but as the number of steps increases during training, it will eat up my memory and then kill my program. I've been configuring it for two months and still haven't gotten it right. I saw in your last reply that you passed the training. Will there be a memory leak during training? If possible, can you add contact information? Thank you.

real-stanford / cloth-funnels

Running compiled pyflex under cuda-11.1 failed, undefined symbol: cudaSetupArgument #6

Environment

Error