Open nylki opened 8 years ago
Looks like a problem with the HDF5 library. Make sure you've installed the correct one: https://github.com/deepmind/torch-hdf5
Hi, Justin,
Same problem here, and it persists.
I installed torch-hd5 locally on ubuntu 14.04, following 'https://github.com/deepmind/torch-hdf5/blob/master/doc/usage.md'
Message:
hdf5 0-0 is now built and installed in /people/huang/tools/torch/install/ (license: BSD)
I tried to run on a cluster with gpu, and train.lua fails because of hdf5.
Environment settings
set torch_rnn=/people/huang/tools/torch-rnn
setenv PATH /people/huang/tools/torch/install/bin:${PATH}
setenv LD_LIBRARY_PATH /people/huang/tools/torch/install/lib:${LD_LIBRARY_PATH}
setenv PATH /usr/local/cuda/bin:${PATH}
setenv LD_LIBRARY_PATH /usr/local/cuda/lib64:${LD_LIBRARY_PATH}
setenv PYTHONPATH /people/huang/local/canopy/User/lib/python2.7/site-packages/
Error message when running train.lua
th $torch_rnn/train.lua -input_h5 $tmp/lm_lstm_torch/data/my_data.h5 -input_json $tmp/lm_lstm_torch/data/my_data.json -model_type lstm -num_layers 3 -rnn_size 512 -gpu_backend opencl > $tmp/lm_lstm_torch/lm_lstm_torch.log /people/huang/tools/torch/install/bin/luajit: ...e/huang/tools/torch/install/share/lua/5.1/trepl/init.lua:363: ...e/huang/tools/torch/install/share/lua/5.1/trepl/init.lua:363: ...ple/huang/tools/torch/install/share/lua/5.1/hdf5/ffi.lua:29: libhdf5.so: cannot open shared object file: No such file or directory stack traceback: [C]: in function 'error' ...e/huang/tools/torch/install/share/lua/5.1/trepl/init.lua:363: in function 'require' /people/huang/tools/torch-rnn/train.lua:6: in main chunk [C]: in function 'dofile' ...ools/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk [C]: at 0x00406670
$tmp/lm_lstm_torch/lm_lstm_torch.log
/people/huang/tools/torch/install/share/lua/5.1/hdf5/init.lua:15 Unable to find the HDF5 lib we were built against - trying to find it elsewhere
@jcjohnson I installed the hdf5 lib as specified in your readme:
git clone https://github.com/deepmind/torch-hdf5
cd torch-hdf5
luarocks make hdf5-0-0.rockspec
I am on Fedora 23 btw.
Does the following command work?
th -e "require 'hdf5'"
Thanks for the quick reply.
The command gives the same error.
th -e "require 'hdf5'" ...le/huang/tools/torch/install/share/lua/5.1/hdf5/init.lua:15 Unable to find the HDF5 lib we were built against - trying to find it elsewhere ...e/huang/tools/torch/install/share/lua/5.1/trepl/init.lua:363: ...ple/huang/tools/torch/install/share/lua/5.1/hdf5/ffi.lua:29: libhdf5.so: cannot open shared object file: No such file or directory
However, the hdf5 files are in such a directory as shown in the error message
ls -l /people//huang/tools/torch/install/share/lua/5.1/hdf5/ total 60 -rw-r--r-- 1 huang grptlp 262 Apr 11 15:30 config.lua -rw-r--r-- 1 huang grptlp 6102 Apr 11 15:30 dataset.lua -rw-r--r-- 1 huang grptlp 3709 Apr 11 15:30 datasetOptions.lua -rw-r--r-- 1 huang grptlp 13057 Apr 11 15:30 ffi.lua -rw-r--r-- 1 huang grptlp 5209 Apr 11 15:30 file.lua -rw-r--r-- 1 huang grptlp 10622 Apr 11 15:30 group.lua -rw-r--r-- 1 huang grptlp 3037 Apr 11 15:30 init.lua -rw-r--r-- 1 huang grptlp 1372 Apr 11 15:30 testUtils.lua
@jcjohnson Same error here as well:
➜ torch-rnn git:(master) th -e "require 'hdf5'"
/home/tom/torch/install/share/lua/5.1/trepl/init.lua:363: /home/tom/torch/install/share/lua/5.1/hdf5/ffi.lua:56: ';' expected near ')' at line 579
To make HDF5 work you also need to install the C library; the lua package is just a wrapper. For example on Ubuntu you need to run
sudo apt-get install libhdf5-dev
Did you do that?
@jcjohnson Yep. On Fedora I did dnf install hdf5-devel
. It's version 1.8.15.
@gp-huang Sorry, I'm not sure how to do that - I've never tried to install locally.
@nylki Can you take a look at the file that is throwing a syntax error, and maybe paste it as a gist or pastebin? Does it look like this?
https://github.com/deepmind/torch-hdf5/blob/f364b442655b0fe21dafe83104f42c3bb7b2a594/luasrc/ffi.lua
The fact that you are getting a syntax error is very strange - it makes me think that somehow your torch-hdf5 install got corrupted.
@jcjohnson I just did a diff on the file I have on my system and the one you linked. They are identical. (for reference here is mine: https://gist.github.com/nylki/d823e303a8faa0b185895998f38a1524 )
Thanks for not giving up on me so far! :) What else could possibly result in the syntax error?
@nylki We are in the territory of debugging torch-hdf5 now, which I don't know much about; you might have better luck opening an issue over there instead.
But what is basically going on in this file is that Lua code is programmatically generating C code, and then using the luajit foreign function interface API to compile the C code and expose it as a set of Lua functions. The syntax error is happening when luajit tries to compile the generated C code.
I'd try adding a print statement here
https://gist.github.com/nylki/d823e303a8faa0b185895998f38a1524#file-ffi-lua-L56
to see whether cdef looks like valid C code, or whether it has some C syntax error.
Another possibility is that whatever C compiler is getting invoked on your system by luajit is somehow different or more strict than the one torch-hdf5 was expecting; I'm not sure how to debug that.
I also commented on the blog post,
"All the steps have worked fine until running the training program. At that step I get this error:
Mariums-MacBook-Pro:torch-rnn mariumsultan$ th train.lua =input_h5 data/Dracula.h5 -input_json data/Dracula.json-gpu-1 /Users/mariumsultan/torch/install/bin/luajit: …/mariumsultan/torch/install/share/lua/5.1/trepl/init.lua:384: …/mariumsultan/torch/install/share/lua/5.1/trepl/init.lua:384: …rs/mariumsultan/torch/install/share/lua/5.1/hdf5/ffi.lua:42: Error: unable to locate HDF5 header file at hdf5.h stack traceback: [C]: in function ‘error’ …/mariumsultan/torch/install/share/lua/5.1/trepl/init.lua:384: in function ‘require’ train.lua:6: in main chunk [C]: in function ‘dofile’ …ltan/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x01031bfcf0 Mariums-MacBook-Pro:torch-rnn mariumsultan$
Any suggestions?"
I'd like to add that my Hdf5 is built and installed.
and that the suggested libhdf5-dev didn't install through brew
I don't know if there is bug in the building process of torch-hdf5 but in my case I got this error because it didn't find the include path of hdf5.h, so I added it by hand in the config file /home/marcus/torch/install/share/lua/5.1/hdf5/config.lua
.
Just locate your hdf5.h
(presumably /usr/include
) and then set the correct variable:
HDF5_INCLUDE_PATH = "/usr/include"
I hope it solves your issue as well.
@MarcoCiccone That worked for me after two days of searching the answer. Thanks.
Just a note: I added two paths since I don't think that /usr/include has the right file. My config.lua now looks like this:
hdf5._config = { HDF5_INCLUDE_PATH = "", HDF5_INCLUDE_PATH = "/usr/include", HDF5_INCLUDE_PATH = "/usr/local/include", HDF5_LIBRARIES = "/usr/lib/libpthread.dylib;/usr/local/lib/libhdf5_cpp.dylib;/usr/local/lib/libhdf5.dylib;/usr/local/lib/libsz.dylib;/usr/lib/libz.dylib;/usr/lib/libdl.dylib;/usr/lib/libm.dylib" }
I tested that it worked by running th -e "require 'hdf5'"
Hi, I got the solution for this issue.(Mac OS 10.11)
If you install hdf5 by brew install hdf5
, the hdf5
will be installed at /usr/local/Cellar/hdf5
.
Once you installed torch-hdf5
from the deepmind repo, you should edit the config.lua
at /Users/yoosan/torch/install/share/lua/5.1/hdf5
(replace the torch path) with HDF5_INCLUDE_PATH="/usr/local/Cellar/hdf5/1.8.16_1/include"
(note the version).
To extend on @yoosan's answer above I found that adding /usr/local/include
worked as well.
/Users/yad/torch/install/share/lua/5.1/trepl/init.lua:384: /Users/yad/.luarocks/share/lua/5.1/hdf5/ffi.lua:42: Error: unable to locate HDF5 header file at hdf5.h
I edited the line 42 to include the path and it works now.
@nylki: This is the solution to your original problem.
To make it easier for people to copy/paste and understand, here's what my torch/install/share/lua/5.1/hdf5/config.lua
looks like before:
hdf5._config = {
HDF5_INCLUDE_PATH = "",
HDF5_LIBRARIES = "/usr/local/lib/libhdf5.dylib;/usr/local/lib/libsz.dylib;/usr/lib/libz.dylib;/usr/lib/libdl.dylib;/usr/lib/libm.dylib"
}
and after:
hdf5._config = {
HDF5_INCLUDE_PATH = "/usr/local/include",
HDF5_LIBRARIES = "/usr/local/lib/libhdf5.dylib;/usr/local/lib/libsz.dylib;/usr/lib/libz.dylib;/usr/lib/libdl.dylib;/usr/lib/libm.dylib"
}
:smile:
Hi, I'm having trouble running th train.lua -- think it's an HDF5 problem , because I tried running:
th -e "require 'hdf5'"
and I'm getting this error.
/Users/Apple/torch/install/share/lua/5.1/trepl/init.lua:389: /Users/Apple/torch/install/share/lua/5.1/hdf5/config.lua:2: unexpected symbol near 'local'
I've edited my config.lua file to the above answers content, that helped with the initial HDF5 errors. Any ideas on how to fix this? Thanks
@grishmarao, it sounds like you just have a syntax error in the config file. Would you like to paste what you've got?
Changing the config.lua
file doesn't solve the problem for me. My error code never included the nable to locate HDF5 header file at hdf5.h
message. I just get:
th -e "require 'hdf5'" ..<UserName>/torch/install/share/lua/5.1/trepl/init.lua:389: ...s/<UserName>/torch/install/share/lua/5.1/hdf5/ffi.lua:56: ')' expected near '_close' at line 1436
I was having the same ' hdf5 header not found' error, which I resolved by making the config.lua have one and only one path in it.. i.e. HDF5_INCLUDE_PATH = "/usr/local/Cellar/hdf5/1.10.1_2/include"
/usr/local/include didn't work on it's own - I suspect that it was having trouble following the symlinks. I also suspect that ; wasn't working as a delimiter when I had both paths in place, there was something fishy in the text of the error that it threw in that case.
However that's all resolved now and I've got the same issue as @McLawrence and @grishmarao .. well the same error but on different lines. I must have a different version of something..
/Users/.../torch/install/bin/luajit: /Users/.../torch/install/share/lua/5.1/trepl/init.lua:389: /Users/.../torch/install/share/lua/5.1/trepl/init.lua:389: /Users/.../torch/install/share/lua/5.1/hdf5/ffi.lua:56: ')' expected near '_close' at line 3401
OSX 10.12.6 home-brew 1.3.1, FWIW.
I am also getting the same error: /Users/mars/torch/install/bin/luajit: /Users/mars/torch/install/share/lua/5.1/trepl/init.lua:389: /Users/mars/torch/install/share/lua/5.1/trepl/init.lua:389: /Users/mars/torch/install/share/lua/5.1/hdf5/ffi.lua:56: ')' expected near '_close' at line 1437 stack traceback: [C]: in function 'error' /Users/mars/torch/install/share/lua/5.1/trepl/init.lua:389: in function 'require' train.lua:6: in main chunk [C]: in function 'dofile' ...mars/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x010e18ea10
Getting the same error as well but different line numbers, same system specs / versions as @mendadala.
/Users/cam/torch/install/bin/luajit: /Users/cam/torch/install/share/lua/5.1/trepl/init.lua:389: /Users/cam/torch/install/share/lua/5.1/trepl/init.lua:389: /Users/cam/torch/install/share/lua/5.1/hdf5/ffi.lua:56: ')' expected near '_close' at line 1472
stack traceback:
[C]: in function 'error'
/Users/cam/torch/install/share/lua/5.1/trepl/init.lua:389: in function 'require'
train.lua:6: in main chunk
[C]: in function 'dofile'
.../cam/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x010969ba10
I'm guessing a dependency might be out of date somewhere? Seems more likely than a typo that made it into production.
I've solved my issues and got it all working! I've lost the place where I found the solution to the ')' expected near '_close' error, but the solution was to edit line 44 of install/share/lua/5.1/hdf5/ffi.lua to read
local process = io.popen("gcc -D '_Nullable=' -E " .. headerPath) -- TODO pass -I
then
brew install hdf5@1.8
mv /usr/local/Cellar/hdf5@1.8/1.8.19 /usr/local/Cellar/hdf5/
then adjust install/share/lua/5.1/hdf5/config.lua so it now reads
hdf5._config = { HDF5_INCLUDE_PATH = "/usr/local/Cellar/hdf5/1.8.19/include", HDF5_LIBRARIES = "/usr/local/Cellar/hdf5/1.8.19/lib/libhdf5.dylib;/usr/local/opt/szip/lib/libsz.dylib;/usr/lib/libz.dylib;/usr/lib/libdl.dylib;/usr/lib/libm.dylib" }
Hello all, thanks a tone for al your heIp, I finally resolved all my errors. Now, after all this what i went through it is giving me "out of memory" error!
MARSs-MBP:torch-rnn neptune$ th train.lua -input_h5 data/tiny_shakespeare.h5 -input_json data/tiny_shakespeare.json
Running with CUDA on GPU 0
THCudaCheck FAIL file=/Users/neptune/torch/extra/cutorch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
/Users/neptune/torch/install/bin/luajit: /Users/neptune/torch/install/share/lua/5.1/nn/utils.lua:11: cuda runtime error (2) : out of memory at /Users/neptune/torch/extra/cutorch/lib/THC/generic/THCStorage.cu:66
stack traceback:
[C]: in function 'resize'
/Users/neptune/torch/install/share/lua/5.1/nn/utils.lua:11: in function 'torch_Storage_type'
/Users/neptune/torch/install/share/lua/5.1/nn/utils.lua:57: in function 'recursiveType'
/Users/neptune/torch/install/share/lua/5.1/nn/Module.lua:160: in function 'type'
/Users/neptune/torch/install/share/lua/5.1/nn/utils.lua:45: in function 'recursiveType'
/Users/neptune/torch/install/share/lua/5.1/nn/utils.lua:41: in function 'recursiveType'
/Users/neptune/torch/install/share/lua/5.1/nn/Module.lua:160: in function 'type'
train.lua:96: in main chunk
[C]: in function 'dofile'
...tune/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x0106f02bd0
PLEASE HELP!
Solved this error by shutting down the system and freshly ran the code. All worked.!
My issue was also fixed! Thanks @dj2mn :+1:
For some reason my config file contained this:
HDF5_INCLUDE_PATH = "/usr/local/Cellar/hdf5/1.8.19/include;/usr/local/opt/szip/include"
Which I simply changed to this:
HDF5_INCLUDE_PATH = "/usr/local/Cellar/hdf5/1.8.19/include"
I have no idea where the extra ;/usr/local/opt/szip/include
came from..
It now works, I am using macOS High Sierra (10.13)
@Benimation I've done that but it continues to give me the old path in the error. Did you need to do anything aside from change the file?
@timendez I don't remember exactly what I did.. Probably about everything that's being suggested on this page..
@Benimation That's alright! I figured out I had two installations of hdf5 across two separate torches, which was messing a lot of stuff up. Removing one caused everything else to work
I installed all dependencies and preprocessed a txt (tried to with the provided shakespeare.txt). However
train.lua
throws some error. What could be the cause of this?