Closed achetverikov closed 8 years ago
I'm getting the same thing, on Windows 10 and Mac OS X Yosemite. For an ugly workaround, I just copy-pasted what was in distributions/meta.py over the import statement in the distributions' __init__.py
I removed meta.py
, can someone confirm if that fixes this issue?
@twiecki seem to be working now
Great, thanks for checking.
Please let me know your cuda experiences, especially if you see any speed-ups!
I've done some simple tests and though Theano tests show CUDA benefits, there does not seem to be any difference with PyMC3 (see attachment). Maybe I'm using the wrong problems, because my understanding of CUDA is really shallow.
Thanks for posting, it matches my own experiments. Not sure why the two are that close together.
Perhaps it suggests that the sampler is taking up most of the time rather than the model evaluation? Is this NUTS? You might want to try with metropolis.
Yes, this was with NUTS. I also tried Dirichlet mixture model from Geysir example (https://github.com/pymc-devs/pymc3/blob/master/pymc3/examples/dp_mix.ipynb) that uses both Metropolis and ElemwiseCategoricalStep. The results are pretty much the same:
GPU: 20000 of 20000 complete in 109.3 sec Execution time: 121.12
CPU: 20000 of 20000 complete in 107.4 sec Execution time: 119.38
I wonder if perhaps theano even utilizes the GPU here. Maybe there's a CPU fall-back.
OK, one more test. I thought that maybe it's necessary to use theano.shared(...) to benefit, but it's not the case. For a simple linear regression with x and y in shared() the times with GPU are actually higher than with CPU (which seem to be negatively influenced by shared as well): GPU: 20000 of 20000 complete in 18.2 sec. Execution time: 29.73 CPU: 20000 of 20000 complete in 10.6 sec. Execution time: 21.85
The same goes for Geysir example with the data in shared(): CPU: 20000 of 20000 complete in 104.7 sec. Execution time: 118.79 GPU: 20000 of 20000 complete in 143.3 sec. Execution time: 155.19
I wonder if @nouiz has any pointers on how to profile GPU execution.
check this doc page on how to profile Theano function execution time:
http://deeplearning.net/software/theano/tutorial/profiling.html
On Tue, Mar 29, 2016 at 11:57 AM, Thomas Wiecki notifications@github.com wrote:
I wonder if @nouiz https://github.com/nouiz has any pointers on how to profile GPU execution.
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/pymc-devs/pymc3/issues/1026#issuecomment-202972550
When I try to use pymc3 with CUDA, I get the following message:
When CUDA is not used, there are no errors and theano tests show that CUDA is working fine. The same error appears on Kubuntu 15.10 and Windows 10.
Kubuntu info: $ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2014 NVIDIA Corporation Built on Thu_Jul_17_21:41:27_CDT_2014 Cuda compilation tools, release 6.5, V6.5.12
Theano: '0.8.0rc1' PyMC3: 3.0