running against CPU - cores under-utilized

jcjohnson / neural-style

Torch implementation of neural style algorithm

MIT License

18.31k stars 2.7k forks source link

running against CPU - cores under-utilized #88

Open ghost opened 8 years ago

ghost commented 8 years ago

Hi - I'm trying to run this algoriths against multi-cpu machine. And I only get 8 (out of 32) cores working simultaneously. Are there any settings to make it utilize more cores?

p.s. beautiful code!!!

jcjohnson commented 8 years ago

The multicore speedup isn't specific to neural-style; we rely on torch7 for that, which in turn relies on a BLAS library like OpenBLAS. To use all cores you'll need to check your torch7 and BLAS installations.

If you're using OpenBLAS then you can try setting OPENBLAS_NUM_THREADS to a higher number; you might also need to rebuild OpenBLAS and set NUM_THREADS to a bigger number. Both are described here: https://github.com/xianyi/OpenBLAS/wiki/faq

CoelacanthsKill commented 8 years ago

@jcjohnson Complete instructions, pls. Or have it enabled by default.

davesade commented 7 years ago

Might be the same as #40 .

monkey-jsun commented 7 years ago

I tried to set env vars,

export OPENBLAS_NUM_THREADS=10 export OMP_NUM_THREADS=10

That did not help.

I also tested luajit which seems to detect number of cores correctly. See below.

I'm not sure what else could be the reason. In short, I basically followed the README.md on Ubuntu 16.04 server and got this far. So it is not even clear to me whether I'm using OpenBLAS or not (Guess the README.md is too well written ;P). Any hints?

| | | |
| | _| |_
| |/ | '/ | ' \ | | () | | | (| | | | _/_/|| \|| ||

JIT: ON SSE2 SSE3 SSE4.1 fold cse dce fwd dse narrow loop abc sink fuse t7> return torch.getnumthreads() 28
t7>

htoyryla commented 7 years ago

I think it is unfair to blame the neural_style README. Neural-style requires torch, and torch requires openblas for multithread support, so this is a dependency of a dependency. When you are installing torch, the install-deps script will attempt to install openblas, if it is not yet installed. If this fails for some reason, it is not something that should be covered in the neural-style README.

I think part of the problem is that while the torch install scripts are easy to use when everything goes well, it is also easy to miss information such as "OpenBLAS could not be installed". But that is a torch issue really (and torch community in github is strictly development oriented, any user problems are handled in a user forum https://groups.google.com/forum/#!forum/torch7 . Search for openblas there https://groups.google.com/forum/#!searchin/torch7/openblas%7Csort:relevance .

To check if your Torch install uses openblas: https://groups.google.com/forum/#!searchin/torch7/openblas%7Csort:relevance/torch7/sdoRkTyGwKc/2elCe9jnAgAJ

monkey-jsun commented 7 years ago

Alas, problem solved. The problem was that I did not have openblas installed.

As pointed by previous post, one can quickly check whether openblas is used or not by

ldd ~/torch/install/lib/libTH.so | grep openblas

On ubuntu the following commend would install openblas. Do this be fore installing torch:

sudo apt-get install libopenblas-dev

After this fix, I now have all 28 cores happily humming alone. The training speed is at least 10 times faster.

Thanks, all!

hurnhu commented 6 years ago

@monkey-jsun did you just install openblas and it used all cores? i just did that but its still using only one core