jcjohnson / densecap

Dense image captioning in Torch
MIT License
1.58k stars 429 forks source link

Inference on CPU very slow #65

Closed sampathchanda closed 7 years ago

sampathchanda commented 7 years ago

Hi,

While trying make inference on the elephant.jpg image in the starter example, using a CPU took me almost 22 minutes. (Pretty much powerful CPU). Is it expected ? or is there is something I am missing out?

nghiaiosdev commented 7 years ago

I have the same problem. Do you have fix it yet?

sampathchanda commented 7 years ago

Nope, I couldn't fix it yet.

jcjohnson commented 7 years ago

I don't have a ton of experience with running Torch on CPU, but sometimes I've seen that Torch does not properly utilize multiple threads in BLAS calls; this would cause slowness on CPU.

There are some details here about configuring BLAS with Torch:

https://github.com/torch/dok/blob/master/docinstall/blas.md

However it was last updated in February 2014 so it may be outdated.

nghiaiosdev commented 7 years ago

I have big problem when I run model densecap on GPU THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-768/cutorch/init.c line=261 error=46 : all CUDA-capable devices are busy or unavailable /home/mmlabgpu3/torch/install/bin/luajit: /home/mmlabgpu3/torch/install/share/lua/5.1/trepl/init.lua:389: /home/mmlabgpu3/torch/install/share/lua/5.1/trepl/init.lua:389: /home/mmlabgpu3/torch/install/share/lua/5.1/cudnn/find.lua:165: cuda runtime error (46) : all CUDA-capable devices are busy or unavailable at /tmp/luarocks_cutorch-scm-1-768/cutorch/init.c:261 stack traceback: [C]: in function 'error' /home/mmlabgpu3/torch/install/share/lua/5.1/trepl/init.lua:389: in function 'require' ./densecap/utils.lua:31: in function 'setup_gpus' run_model.lua:149: in main chunk [C]: in function 'dofile' ...gpu3/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x004065d0

You can tell me why occur that error?

jcjohnson commented 7 years ago

@eitguide Does your computer have an NVIDIA GPU?

sampathchanda commented 7 years ago

Yes, it turns out to be that torch is not able to use all the available threads on the CPU, while running on a Intel KNL node (that has 64 cores). However, I see that inference of the same image is taking around 23 seconds on MacBook Pro.