jcjohnson / neural-style

Torch implementation of neural style algorithm
MIT License
18.31k stars 2.7k forks source link

sched_yield in strace all the time #78

Open SlinkoIgor opened 8 years ago

SlinkoIgor commented 8 years ago

Hi! I'm grateful for your work. On my macbook it works perfectly, but when I'm trying to run it on ubuntu with Xeon E5-2660 2.20GHz with 16 cores + hyper threading, it works ridiculously slow.

In strace I see lots of sched_yield and futexes. I have 1 core 100% loaded and others just waiting.

gdb shows this: http://pastebin.com/YvJ7Hykm

It looks like OpenBLAS lib is doing something wrong. Can you suggest smth?

jcjohnson commented 8 years ago

You can try setting environment variables to tell OpenBLAS how many threads it should use:

https://github.com/xianyi/OpenBLAS#set-the-number-of-threads-with-environment-variables

OpenBLAS also sets a maximum number of threads at compile time, so you may need to recompile to use all your cores:

https://github.com/xianyi/OpenBLAS/wiki/faq#usage-1

SlinkoIgor commented 8 years ago

Thanks! Maybe I should take more time diving into BLAS.... I ran your script with schedtool -a 0,15 (where 0,15 are core ids) - I assume, it does the same trick, as setting define for max threads in BLAS, reducing them to 2.