pluskid / Mocha.jl

Deep Learning framework for Julia
Other
1.29k stars 254 forks source link

LoadError while running Pkg.test("Mocha") #173

Closed ashleylid closed 8 years ago

ashleylid commented 8 years ago

Hi,

While running Pkg.test("Mocha") with GPU backend enabled I get a warning and an error:

WARNING: convert(::Type{Ptr}, ::Int64) methods should be converted to be methods of unsafe_convert
 in depwarn at deprecated.jl:73
 [inlined code] from deprecated.jl:418
 in unsafe_convert at no file:0
while loading ~/.julia/v0.4/Mocha/test/layers/shared-parameters.jl, in expression starting on line 50
ERROR: LoadError: LoadError: Not supported

I think it has to do with my version of cuda.

EDIT: it has nothing to do with my version of cuda - have tried 6.0; 7.0 and 7.5.

Thx

pluskid commented 8 years ago

What version of Julia and Mocha are you using? Can you try with

Pkg.checkout("Mocha")

to see if the warning goes away?

ashleylid commented 8 years ago

Tried it again today. Still not playing nice :( julia version 0.4.2 "CUDNN" => v"0.2.1" "Mocha"=>v"0.1.0+"

ashleylid commented 8 years ago

Any further news? I tried checkout again and it says its dirty so no go..

pluskid commented 8 years ago

@kleinash I can reproduce the warning and I'm fixing them now. But the error you got is due to something else. I think your CUDA library is not being loaded properly. Did you set LD_LIBRARY_PATH properly? Can you do Libdl.find_library(["libcudnn"], [""]) in the Julia prompt? If it is empty, then cudnn library cannot be properly found.

pluskid commented 8 years ago

BTW: Pkg.checkout saying dirty means you probably modified Mocha locally. To proceed, you can either commit or discard your local changes.

ashleylid commented 8 years ago

Hi. Thanks for the heads up on dirty. I stashed local changes and checked out. I find libcudnn no problem.

Running again I get this error: ERROR: LoadError: Mocha CUDA kernels not up-to-date. Please re-compile (see documents of BACKEND)

Will take a look at the docs and brb.

EDIT: took at look at http://mochajl.readthedocs.org/en/latest/user-guide/backend.html ran make as mentioned. All looked good till here:

-- Testing convolution layer with shared param on Mocha.GPUBackend{Float64}... 26-Jan 11:30:07:INFO:root:Constructing net test-shared-params on Mocha.GPUBackend... 26-Jan 11:30:07:INFO:root:Topological sorting 5 layers... 26-Jan 11:30:07:INFO:root:Setup layers... 26-Jan 11:30:08:DEBUG:root:ConvolutionLayer(conv2): sharing filters and bias 26-Jan 11:30:08:INFO:root:Network constructed! 26-Jan 11:30:08:DEBUG:root:Init network test-shared-params 26-Jan 11:30:08:DEBUG:root:Init parameter filter for layer conv1 26-Jan 11:30:08:DEBUG:root:Init parameter bias for layer conv1 ERROR: LoadError: LoadError: Not supported [inlined code] from ~/.julia/v0.4/Mocha/src/cuda/cudnn.jl:53

pluskid commented 8 years ago

@kleinash I think your CUDA library is not being loaded properly. Did you set LD_LIBRARY_PATH properly? Can you do Libdl.find_library(["libcudnn"], [""]) in the Julia prompt? If it is empty, then cudnn library cannot be properly found.

ashleylid commented 8 years ago

This is what I get in REPL

julia> Libdl.find_library(["libcudnn"], [""])
"libcudnn"

./bashrc

...
export CUDA_HOME=/usr/local/cuda
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64
PATH=${CUDA_HOME}/bin:${PATH}
export PATH

and...

$ ls /usr/local/cuda/lib64 | grep libcudnn
libcudnn.so
libcudnn.so.4
libcudnn.so.4.0.4
libcudnn_static.a

Am I doing something wrong?

EDIT: updated with STDIO attached.

output.txt

pluskid commented 8 years ago

@kleinash Julia Libdl eats a lot of error information. Can you try Libdl.dlopen("full-path-to-the-so-file") to see if you could get a valid pointer or an error message?

You can probably also try (in the shell, instead of Julia REPL) ldd /usr/local/cuda/lib64/libcudnn.so to see if any of its dependency libraries are actually not properly resolved.

ashleylid commented 8 years ago

Yah - you're right I get:

Libdl.dlopen("/usr/local/cuda/lib64/libcudnn.so")
Ptr{Void} @0x000000000231bb90

but everything is installed correctly so have to look deeper

EDIT: when I test CUDNN it passes though..

julia> Pkg.test("CUDNN")
INFO: Testing CUDNN
CUDNN_VERSION = 4004
to_host(cudnnTransformTensor(2,tx,3,ty)) == 2x + 3y = true
squeeze(to_host(cudnnPoolingForward(tx,ty6; pd=pd6)),(3,4)) == [16 / 4 39 / 6 69 / 6 56 / 4;27 / 6 7 12 87 / 6;33 / 6 8 13 93 / 6;39 / 6 9 14 99 / 6;28 / 4 57 / 6 87 / 6 68 / 4] = true
epseq(squeeze(to_host(cudnnPoolingBackward(ty6,tdy4,tx,tdx6; pd=pd6)),(3,4)),[1 / 9 5 / 9 5 / 9 4 / 9;3 / 9 12 / 9 12 / 9 9 / 9;6 / 9 21 / 9 21 / 9 15 / 9;5 / 9 16 / 9 16 / 9 11 / 9;3 / 9 9 / 9 9 / 9 6 / 9]) = false
squeeze(to_host(cudnnPoolingForward(tx,ty9; pd=pd9)),(3,4)) == [16 / 4 69 / 6 33 / 2;33 / 6 13 54 / 3;28 / 4 87 / 6 39 / 2] = true
INFO: CUDNN tests passed

OK I am stuck - its not hard to set up cuda or cudnn. My cuda in stall is working just fine (using the Utils checker) and I copied the cudnn libraries to where they need to be. How can it initialise the cuDNN but stop half way in the layers:

Configuring Mocha...

which all runs till it gets here:

-- Testing convolution layer with shared param on Mocha.GPUBackend{Float64}... 03-Feb 10:46:47:INFO:root:Constructing net test-shared-params on Mocha.GPUBackend... 03-Feb 10:46:47:INFO:root:Topological sorting 5 layers... 03-Feb 10:46:47:INFO:root:Setup layers... 03-Feb 10:46:47:DEBUG:root:ConvolutionLayer(conv2): sharing filters and bias 03-Feb 10:46:47:INFO:root:Network constructed! 03-Feb 10:46:47:DEBUG:root:Init network test-shared-params 03-Feb 10:46:47:DEBUG:root:Init parameter filter for layer conv1 03-Feb 10:46:47:DEBUG:root:Init parameter bias for layer conv1 ERROR: LoadError: LoadError: Not supported

mgtlake commented 8 years ago

I was experiencing the same problem with CUDA 7.5 and CuDNN 4, but it was partially fixed by reverting to CuDNN 3 - the unsafe_convert warnings remained, but the LoadError did not occur and Pkg.test("Mocha") completed successfully.

ashleylid commented 8 years ago

@matthew-lake do you know if there is a way to direct Mocha to the libraries? The install guide says this:

cd export LD_LIBRARY_PATH=pwd:$LD_LIBRARY_PATH

Add <installpath> to your build and link process by adding -I<installpath> to your compile
line and -L<installpath> -lcudnn to your link line.

EDIT: Downgrading worked, kudos.

pluskid commented 8 years ago

Great!

I have not test Mocha against cuDNN v4 yet. Will try to do it recently when I find some time.

amellnik commented 8 years ago

I'm seeing something similar, but downgrading cuDNN doesn't seem to work for me. On windows with either cuDNN 3 or 4, CUDA 7.5 and Mocha master I get:

LoadError: error compiling init: error compiling create: could not load library ""
The specified module could not be found.

The folder with cuDNN is on the path. Is there anything else I can check (or what's the trick to seeing a more descriptive error message)?

ashleylid commented 8 years ago

@amellnik I am in now way good at Windows, and I failed miserably at installing CUDA on Windows. But from what I remember about trying to install python, there was a way to set the global library path: https://msdn.microsoft.com/en-us/library/7d83bc18.aspx https://msdn.microsoft.com/en-us/library/windows/desktop/ms682586%28v=vs.85%29.aspx. Maybe a global search etc can make sure all the libraries, for Julia and CUDA are set - and try running everything through juliabox. All just suggestions..

greenflash1357 commented 8 years ago

@amellnik Your error is cloesly related to #164.

Mocha with cuDNN 4 and CUDA 7.5 work perfectly fine for me on windows. But on linux, I get the same error when running the tests or initializing the bias of a layer... (CPU backend works)

pluskid commented 8 years ago

I fixed some breaking API changes in cuDNN v4. The test are now running properly for me on OSX with Julia v0.4, CUDA 7.5 and cudnn v4. I also made the error message more clear if libcuda is not found. If you see something like

ERROR: LoadError: LoadError: LoadError: Libcuda not found via Libdl.find_library! Please check installation and ENV configuration

Then it means libcuda cannot be found be Julia due to missing LD_LIBRARY_PATH, etc. You can verify this by running some dummy test.jl with the following contents

  const libcuda = Libdl.find_library(["libcuda"], [""])
  println("!!!")
  println(libcuda)
  println(Libdl.find_library(["libcuda"], [""]))

using the same way (the same shell, same environment variable) as you used to test Mocha.

greenflash1357 commented 8 years ago

Thank you very much! Now with cuDNN v4, tests run without errors on windows and linux!

pluskid commented 8 years ago

great to hear that!

phiber1 commented 8 years ago

I'm having a Mocha problem along the same lines. I'm on Ubuntu 16.04, Julia 0.4.3, CUDA 7.5.18, and cuDNN version 4.0.7. While Pkg.test("CUDNN") passes with flying colors, Pkg.test("Mocha") fails with the following:

17-Mar 22:35:14:DEBUG:root:Init parameter bias for layer conv1 ERROR: LoadError: LoadError: Not supported [inlined code] from /home/ENG.pvt/mark/.julia/v0.4/Mocha/src/cuda/cudnn.jl:53 in add_tensor4d at /home/ENG.pvt/mark/.julia/v0.4/Mocha/src/cuda/cudnn.jl:172 while loading /home/ENG.pvt/mark/.julia/v0.4/Mocha/test/layers/shared-parameters.jl, in expression starting on line 50 while loading /home/ENG.pvt/mark/.julia/v0.4/Mocha/test/runtests.jl, in expression starting on line 85 ================================[ ERROR: Mocha ]================================

failed process: Process(/usr/bin/julia --check-bounds=yes --code-coverage=none --color=yes /home/ENG.pvt/mark/.julia/v0.4/Mocha/test/runtests.jl, ProcessExited(1)) [1]

ERROR: Mocha had test errors in test at ./pkg/entry.jl:803 in anonymous at ./pkg/dir.jl:31 in cd at ./file.jl:22


Any suggestions you might have would be greatly appreciated, since this is a show-stopper.

Thanks, Mark

pluskid commented 8 years ago

Can you try with the latest git version with Pkg.checkout("Mocha")? If that solves the issue, I will make a new release.

phiber1 commented 8 years ago

Excellent, now all tests passed and the error is gone. Other than several deprecation warnings in binary-cross-entropy-loss.jl: "WARNING: int(x::AbstractFloat) is deprecated, use round(Int,x) instead.", the original fatal problem seems to have been corrected.

Thanks again!

phiber1 commented 8 years ago

The imagenet-classifier example notebook now also runs correctly (ignoring all the warnings). So I'd say your most recent fix would appear solid.

pluskid commented 8 years ago

Making a new release now. I will close this issue but feel free to re-open it if any other related issue occurred.