Doesn't work if cuda is not installed in usr/local

djl11 commented 6 years ago

If cuda is not installed in 'usr/local', it does not run correctly. This is due to lines 42 and 46 in ops.py

Note also that lines 37-40 are not used in ops.py, and therefore neither is line 17 in config.ini.

My simple fix to this was to replace lines 37 to 46 in ops.py with the following:

    out, err = subprocess.Popen(['which', 'nvcc'], stdout=subprocess.PIPE).communicate()
    cuda_dir = out.decode().split('/cuda')[0]

    nvcc_cmd = "nvcc -std=c++11 -c -o {} {} {} -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -I " + cuda_dir + " --expt-relaxed-constexpr"
    nvcc_cmd = nvcc_cmd.format(" ".join([fn_cu_o, fn_cu_cc]),
                               tf_inc, tf_lib)
    subprocess.check_output(nvcc_cmd, shell=True)
    gcc_cmd = "{} -std=c++11 -shared -o {} {} -fPIC -L " + cuda_dir + "/cuda/lib64 -lcudart {} -O2 -D GOOGLE_CUDA=1"

I also removed line 17 from config.ini

Finally, I know this is a custom config file, but I would also suggest perhaps changing line 15 in config.ini to

g++ = g++

I have tested on 3 different machines, all with different versions of linux, tensorflow, and cuda. With these small changes, your code runs immediately following a clone from this repo on all of them (after copying over the config.ini file of course.)

Just a few small suggestions to make things as out-the-box as possible! :)

simonmeister commented 6 years ago

Many thanks! I changed the code as suggested. Sorry for taking so long to respond - I am very busy with new projects.

bragilee commented 6 years ago

@simonmeister @djl11

Hi,

Thanks for help in this issue.

It seems that I still face this problem (cuda related) now. My settings are: cuda 8.0 (I work on server, which means cuda is installed in shared path), tensorflow-gpu 1.10.

My errors are as followings, I very appreciate your help if you have some time.

nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). In file included from /home/runzeli/anaconda3/lib/python3.5/site-packages/tensorflow/include/tensorflow/core/util/cuda_kernel_helper.h:21:0, from backward_warp_op.cu.cc:8: /home/runzeli/anaconda3/lib/python3.5/site-packages/tensorflow/include/tensorflow/core/util/cuda_device_functions.h:32:31: fatal error: cuda/include/cuda.h: No such file or directory compilation terminated. Traceback (most recent call last): File "/data2/Runze/UnFlow/src/e2eflow/ops.py", line 59, in op_lib = tf.load_op_library(lib_path) File "/home/runzeli/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/load_library.py", line 56, in load_op_library lib_handle = py_tf.TF_LoadLibrary(library_filename) tensorflow.python.framework.errors_impl.NotFoundError: ./backward_warp_op.so: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "run.py", line 7, in from e2eflow.core.train import Trainer File "/data2/Runze/UnFlow/src/e2eflow/core/train.py", line 12, in from ..ops import forward_warp File "/data2/Runze/UnFlow/src/e2eflow/ops.py", line 61, in compile(n) File "/data2/Runze/UnFlow/src/e2eflow/ops.py", line 43, in compile subprocess.check_output(nvcc_cmd, shell=True) File "/home/runzeli/anaconda3/lib/python3.5/subprocess.py", line 626, in check_output **kwargs).stdout File "/home/runzeli/anaconda3/lib/python3.5/subprocess.py", line 708, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command 'nvcc -std=c++11 -c -o backward_warp_op.cu.o backward_warp_op.cu.cc -I/home/runzeli/anaconda3/lib/python3.5/site-packages/tensorflow/include -D_GLIBCXX_USE_CXX11_ABI=0 -L/home/runzeli/anaconda3/lib/python3.5/site-packages/tensorflow -ltensorflow_framework -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -I /usr/local --expt-relaxed-constexpr' returned non-zero exit status 1

bragilee commented 6 years ago

it has been solved, thank. :)

DeckerDai commented 5 years ago

@bragilee Excuse me, could you tell me how to solve this issue? I have exactly the same problem but have no idea how to fix it...

Thank you very much!

bragilee commented 5 years ago

Hi @DeckerDai ,

sorry about late. It has been a long time since I did not work on this problem. Previously I tried to run this repo then I moved to other projects. For my case, I only make the cuda version+cudnn version compatible with this repo. I am not quite sure about what other factors will have effects on compilation, for example, gcc. I suggest you can start to follow exactly the same versions used in this repo.

Thank you.

simonmeister / UnFlow

Doesn't work if cuda is not installed in usr/local #39