mjiUST / SurfaceNet

2017 ICCV, SurfaceNet: An End-to-end 3D Neural Network for Multiview Stereopsis
Other
123 stars 36 forks source link

Cant works on theano 1,theano.sandbox.cuda.dnn is discarded in new version #3

Open cdb0y511 opened 6 years ago

cdb0y511 commented 6 years ago

Could you update your source file layer.py? Because theano.sandbox.cuda.dnn is discarded in theano 1(>theano 0.9). from theano.sandbox.cuda.dnn import gpu_contiguous, GpuDnnConvDesc, gpu_alloc_empty, GpuDnnConv3dGradW wont work, and if lasagne.utils.theano.sandbox.cuda.dnn_available() in similarityNet.py. Could you use theano.gpuarray.dnn instead? I cant replace gpu_contiguous, GpuDnnConvDesc, gpu_alloc_empty, GpuDnnConv3dGradW with classes of theano.gpuarray.dnn by myself. And I cant backwards to theano 0.9 either, due to the new vision of cudnn does not support old theano and pygpu. plz help me,thanks

Rubikplayer commented 6 years ago

Nice work for 3D reconstructon! I have some simliar issues here.

@mjiUST Could you give us some tips to make the code running on a newer system? My system is:

Or:

Do you have suggestions for running/training without cuDNN?

I observed there are some if-branch, like in similarityNet.py:

if lasagne.utils.theano.sandbox.cuda.dnn_available(): # when cuDNN available
    from lasagne.layers.dnn import Conv2DDNNLayer as ConvLayer 
else:
    from lasagne.layers import Conv2DLayer as ConvLayer

But in layers.py and SurfaceNet.py, some cudnn functions are hardcoded

Following the same logic in the if-branch, maybe for Conv3DDNNLayer and Pool3DDNNLayer:

I might be able to hack it to:

from lasagne.layers import Conv3DLayer as Conv3DDNNLayer
from lasagne.layers import Pool3DLayer as Pool3DDNNLayer

But for other functions like gpu_contiguous, I haven't found any functions to replace so far. If you got any suggestion, please let us know! Thanks!

@cdb0y511 How are things going with you?

mjiUST commented 6 years ago

Dear @cdb0y511 @Rubikplayer ,

Thanks for the issue report. I specified the older Theano version https://github.com/mjiUST/SurfaceNet/blob/149f6e05c084ee4e757b5bd9b8efef8f46b78ffb/installEnv.sh#L34

Since the 3D dilated conv layer was implemented using some APIs in CUDNN, I'm not sure whether we could easily discard CUDNN.

If you are worried about that the installation may affect your existing packages' version. Please feel free to use the SurfaceNet/installEnv.sh, that will not change anything of your existing python, theano, and ~/.bashrc. What you need to do is to specify the CUDA/CUDNN pathes accordingly. Please refer to the updated README.

Hope this may help.

cdb0y511 commented 6 years ago

@mjiUST Thanks a lot. And well done. I am a Ph.D. candidate too. Maybe we can disscuss about your work one day. but frist , I want to figure out how it works. I have read the installEnv.sh. And I totally understand how to use conda and install specified theano 0.9( even your scrpits install latest theano). You dont need to discatd CUDNN. The problem is theano.sandbox is an old back end. You'd better switch to a new backend theano.gpuarray. pls see https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end(gpuarray) Otherwise new drivers and new cuda may not compatible with it. I know you use the nvidia driver 375, cuda 8.0, cudnn v5.1. But I need cuda 9.0 and cudnnv7.1.1 for tensorflow1.6. So the latest nvidia driver has been installed.

Even I use theano 0.9. Exception: ('The following error happened while compiling the node', <theano.sandbox.cuda.DnnVersion object at 0x7f9028151110>(), '\n', 'The nvidia driver version installed with this OS does not give good results for reduction.Installing the nvidia driver available on the same download page as the cuda package will fix the problem: http://developer.nvidia.com/cuda-downloads')

The only way is switching to a new backend theano.gpuarray. Or give up cuda 9.0 and cudnnv7.1.1. Go back to nvidia driver 375, cuda 8.0, cudnn v5.1. Its hard to choose. And it certainly limits your work.

@Rubikplayer I cant find gpu_contiguous too,even in theano 0.9's doucuments. So I guess only the original author can fix it.

mjiUST commented 6 years ago

@cdb0y511 Thanks for your interest and looking forward to having further discussion.

I don't know whether you have tried this method: say you have both /usr/local/cuda-8.0 and /usr/local/cuda that linked to cuda-9.0. Change the 1st line of ~/miniconda2/envs/SurfaceNet/etc/conda/activate.d/activate-cuda.sh to export CUDA_ROOT=/usr/local/cuda-8.0 which will not affact your settings in .bashrc before you source activate SurfaceNet. In this way, even though you may have multiple cuda versions in your PC, a particular one could be specified without ANY influence with your other projects (for example, tensorflow and pytorch).

Similarly, one can also specify a cudnn without influence with other projects by changing the 1st line of ~/miniconda2/envs/SurfaceNet/etc/conda/activate.d/activate-cudnn.sh to any path where the cudnn folder located, e.g., export CUDNN_ROOT=/home/<user-name>/libs/cudnn-8.0-v5.1.

I highly recommend you install CUDNN outside of CUDA folder, so that you can have any combination of CUDA+CUDNN by defining specific environment variables in different conda_envs.

Please feel free to post any queries.

Rubikplayer commented 6 years ago

@mjiUST @cdb0y511 Yes, yesterday I did the following, and it can start running the main.py (although some other error occurs):

For the error I encountered, I will open another issue. Thanks for the feedback! Edit: new issue opened: (https://github.com/mjiUST/SurfaceNet/issues/4)

mjiUST commented 6 years ago

@Rubikplayer Thank you for the feedback. To be precise,

Rubikplayer commented 6 years ago

@mjiUST Thanks for the response.