karpathy / neuraltalk2

Efficient Image Captioning code in Torch, runs on GPU
5.51k stars 1.26k forks source link

Issue when running on Jetson TX1 #62

Open Ashram56 opened 8 years ago

Ashram56 commented 8 years ago

Good afternoon,

I'm trying to get Neuraltalk to run on a Jetson TX1.

I successfully managed to install Torch and all other dependencies listed in the main page, however I get this error when trying to run the command: th eval.lua -model /path/to/model -image_folder /path/to/image/directory -num_images 10

(of course all paths have been replaced with the correct path, I'm using the model provided)

/usr/local/bin/luajit: /usr/local/share/lua/5.1/torch/File.lua:317: table index is nil stack traceback: /usr/local/share/lua/5.1/torch/File.lua:317: in function 'readObject' /usr/local/share/lua/5.1/nn/Module.lua:154: in function 'read' /usr/local/share/lua/5.1/torch/File.lua:298: in function 'readObject' /usr/local/share/lua/5.1/torch/File.lua:316: in function 'readObject' /usr/local/share/lua/5.1/torch/File.lua:316: in function 'readObject' /usr/local/share/lua/5.1/torch/File.lua:347: in function 'load' eval.lua:69: in main chunk [C]: in function 'dofile' /usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x0000d055

Any help would be welcome

Regards

Ashram56 commented 8 years ago

I have followed the instructions from Soumith here: https://github.com/karpathy/neuraltalk2/issues/32

Comment from Dec 5.

I can get further, but still an error:

/usr/local/bin/luajit: /usr/local/share/lua/5.1/torch/File.lua:294: unknown object stack traceback: [C]: in function 'error' /usr/local/share/lua/5.1/torch/File.lua:294: in function 'readObject' /usr/local/share/lua/5.1/torch/File.lua:240: in function 'readObject' /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject' /usr/local/share/lua/5.1/nn/Module.lua:154: in function 'read' /usr/local/share/lua/5.1/torch/File.lua:270: in function 'readObject' /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject' /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject' /usr/local/share/lua/5.1/nn/Module.lua:154: in function 'read' /usr/local/share/lua/5.1/torch/File.lua:270: in function 'readObject' /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject' /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject' eval.lua:76: in function 'load' eval.lua:82: in main chunk [C]: in function 'dofile' /usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x0000d055

I wonder if the patch applied to the eval.lua file is correct

szagoruyko commented 8 years ago

there are accGradParameters functions saved inside the model which cannot be read sometimes. I had to remove them, here is a clean checkpoint https://www.dropbox.com/s/jxkpuqmc1p0xw6e/model_id1-501-1448236541_cpu.t7?dl=0

lanewinfield commented 8 years ago

THANK YOU @szagoruyko this solved my problems on my MacBook Pro.

For reference (and search results), here are the errors I was receiving:

bash-3.2$ th eval.lua -gpuid -1 -model models/model_id1-501-1448236541.t7 -num_images 1
/Users/brianmoore/torch/install/bin/luajit: ...rs/brianmoore/torch/install/share/lua/5.1/torch/File.lua:290: unknown Torch class <torch.CudaTensor>
stack traceback:
    [C]: in function 'error'
    ...rs/brianmoore/torch/install/share/lua/5.1/torch/File.lua:290: in function 'readObject'
    ...rs/brianmoore/torch/install/share/lua/5.1/torch/File.lua:316: in function 'readObject'
    /Users/brianmoore/torch/install/share/lua/5.1/nn/Module.lua:154: in function 'read'
    ...rs/brianmoore/torch/install/share/lua/5.1/torch/File.lua:298: in function 'readObject'
    ...rs/brianmoore/torch/install/share/lua/5.1/torch/File.lua:316: in function 'readObject'
    ...rs/brianmoore/torch/install/share/lua/5.1/torch/File.lua:316: in function 'readObject'
    ...rs/brianmoore/torch/install/share/lua/5.1/torch/File.lua:347: in function 'load'
    eval.lua:69: in main chunk
    [C]: in function 'dofile'
    ...oore/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x010c700bc0
Ashram56 commented 8 years ago

Indeed, this has allowed me to get one step further

I'm still encountering issues though, but a later stage. Not much debug info I'm afraid:

th eval2.lua -model model_id1-501-1448236541_cpu.t7 -image_folder /home/ubuntu/Pictures/ DataLoaderRaw loading images from folder: /home/ubuntu/Pictures/
listing all images in directory /home/ubuntu/Pictures/
DataLoaderRaw found 1 images
constructing clones inside the LanguageModel
Killed

Is there a log file or debug switch I could use ?

Ashram56 commented 8 years ago

Following a hint on another thread, I used the -num_images -1, here's what I have now:

th eval2.lua -model model_id1-501-1448236541_cpu.t7 -image_folder /home/ubuntu/Pictures/ -num_images -1 DataLoaderRaw loading images from folder: /home/ubuntu/Pictures/
listing all images in directory /home/ubuntu/Pictures/
DataLoaderRaw found 1 images
constructing clones inside the LanguageModel
cp "/home/ubuntu/Pictures/The_dealmaker-1024x1024.jpg" vis/imgs/img1.jpg
image 1: a woman walking down a street holding an umbrella
evaluating performance... 0/-1 (0.000000)
loss: nan

rossgoodwin commented 8 years ago

I just solved this problem. If you trained on 64-bit architecture, you need to convert your model to ascii format before you run it on the Jetson's 32-bit ARM v7.

On a 64-bit machine (with GPU if it's a GPU model, which I'm guessing you want because you're running on a Jetson), in the Neuraltalk2 folder, open the Torch interactive interpreter:

> [run all imports from top of eval.lua]
> net = torch.load('/path/to/model.t7')
> torch.save('/path/to/newmodel.t7.ascii', net, 'ascii')

Then, on the Jetson, change line 69 of eval.lua so it reads:

local checkpoint = torch.load(opt.model, 'ascii')

And run with the ascii version of the model.

This should also work on Raspberry Pi by following the same steps with a CPU checkpoint. (You obviously won't need a GPU to convert the CPU checkpoint to ascii format.)

`

ZahlGraf commented 7 years ago

Does anyone from this thread have a valid checkpoint, running on jetson available?