pymc-devs / pymc

Bayesian Modeling and Probabilistic Programming in Python
https://docs.pymc.io/
Other
8.7k stars 2.01k forks source link

NUTS Sampler Fails with pygpu.gpuarray.GpuArrayException Error #3087

Closed MisterRedactus closed 6 years ago

MisterRedactus commented 6 years ago

I am unable to run the NUTS sampler in PyMC3. The case I am running is a straightforward transcription of the basis Normal distribution case in the regression example at http://docs.pymc.io/notebooks/getting_started#Installation:

import numpy as np import matplotlib.pyplot as plt import pymc3 as pm

print('Running on PyMC3 v{}'.format(pm.version))

plt.style.use('seaborn-darkgrid')

Initialize random number generator

np.random.seed(123)

True parameter values

alpha, sigma = 1, 1 beta = [1, 2.5]

Size of dataset

size = 100

Predictor variable

X1 = np.random.randn(size) X2 = np.random.randn(size) * 0.2

Simulate outcome variable

Y = alpha + beta[0]X1 + beta[1]X2 + np.random.randn(size)*sigma

fig, axes = plt.subplots(1, 2, sharex=True, figsize=(10,4)) axes[0].scatter(X1, Y) axes[1].scatter(X2, Y) axes[0].set_ylabel('Y'); axes[0].set_xlabel('X1'); axes[1].set_xlabel('X2'); plt.show()

basic_model = pm.Model() with basic_model:

# Priors for unknown model parameters
alpha = pm.Normal('alpha', mu=0, sd=10)
beta = pm.Normal('beta', mu=0, sd=10, shape=2)
sigma = pm.HalfNormal('sigma', sd=1)

# Expected value of outcome
mu = alpha + beta[0]*X1 + beta[1]*X2

# Likelihood (sampling distribution) of observations
Y_obs = pm.Normal('Y_obs', mu=mu, sd=sigma, observed=Y)

map_estimate = pm.find_MAP(model=basic_model, method='powell')

print(map_estimate)

with basic_model:

draw 500 posterior samples

trace = pm.sample(500)

This code appears to work properly up to the point where the pm.sample line is encountered. At that point I receive a pygpu.gpuarray.GpuArrayException invalid value error. The following is the complete response from the run including traceback:

WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions. Using cuDNN version 5105 on context None Mapped name None to device cuda: GeForce GTX 950 (0000:03:00.0) 0%| | 0/5000 [00:00<?, ?it/s]D :\Programs\Anaconda3\Lib\site-packages\scipy\optimize_minimize.py:502: RuntimeWarning: Method powell does not use gradi ent information (jac). RuntimeWarning) logp = -148.98, ||grad|| = 0.73744: 100%|███████████████████████████████████████████| 183/183 [00:00<00:00, 185.38it/s] {'alpha': array(0.9090931, dtype=float32), 'beta': array([0.9514547, 2.6145666], dtype=float32), 'sigma_log': array(-0 .03494539, dtype=float32), 'sigma': array(0.9656581, dtype=float32)} Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (4 chains in 4 jobs) NUTS: [sigma_log, beta, alpha] joblib.externals.loky.process_executor._RemoteTraceback: """ Traceback (most recent call last): File "D:\Programs\Anaconda3\Lib\site-packages\joblib\externals\loky\backend\queues.py", line 151, in _feed obj, reducers=reducers) File "D:\Programs\Anaconda3\Lib\site-packages\joblib\externals\loky\backend\reduction.py", line 145, in dumps p.dump(obj) File "D:\Programs\Anaconda3\Lib\site-packages\theano\gpuarray\type.py", line 909, in GpuArray_pickler return (GpuArray_unpickler, (np.asarray(cnda), ctx_name)) File "D:\Programs\Anaconda3\Lib\site-packages\numpy\core\numeric.py", line 492, in asarray return array(a, dtype, copy=False, order=order) File "pygpu\gpuarray.pyx", line 1735, in pygpu.gpuarray.GpuArray.array File "pygpu\gpuarray.pyx", line 1405, in pygpu.gpuarray._pygpu_as_ndarray File "pygpu\gpuarray.pyx", line 394, in pygpu.gpuarray.array_read pygpu.gpuarray.GpuArrayException: b'cuMemcpyDtoHAsync(dst, src->ptr + srcoff, sz, ctx->mem_s): CUDA_ERROR_INVALID_VALUE: invalid argument' """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File ".\Regression_Case.py", line 50, in trace = pm.sample(500) File "D:\Programs\Anaconda3\Lib\site-packages\pymc3\sampling.py", line 442, in sample trace = _mp_sample(**sample_args) File "D:\Programs\Anaconda3\Lib\site-packages\pymc3\sampling.py", line 982, in _mp_sample traces = Parallel(n_jobs=cores, mmap_mode=None)(jobs) File "D:\Programs\Anaconda3\Lib\site-packages\joblib\parallel.py", line 962, in call self.retrieve() File "D:\Programs\Anaconda3\Lib\site-packages\joblib\parallel.py", line 865, in retrieve self._output.extend(job.get(timeout=self.timeout)) File "D:\Programs\Anaconda3\Lib\site-packages\joblib_parallel_backends.py", line 515, in wrap_future_result return future.result(timeout=timeout) File "D:\Programs\Anaconda3\Lib\site-packages\joblib\externals\loky_base.py", line 431, in result return self.get_result() File "D:\Programs\Anaconda3\Lib\site-packages\joblib\externals\loky_base.py", line 382, in get_result raise self._exception File "D:\Programs\Anaconda3\Lib\site-packages\joblib\externals\loky\backend\queues.py", line 151, in _feed obj, reducers=reducers) File "D:\Programs\Anaconda3\Lib\site-packages\joblib\externals\loky\backend\reduction.py", line 145, in dumps p.dump(obj) File "D:\Programs\Anaconda3\Lib\site-packages\theano\gpuarray\type.py", line 909, in GpuArray_pickler return (GpuArray_unpickler, (np.asarray(cnda), ctx_name)) File "D:\Programs\Anaconda3\Lib\site-packages\numpy\core\numeric.py", line 492, in asarray return array(a, dtype, copy=False, order=order) File "pygpu\gpuarray.pyx", line 1735, in pygpu.gpuarray.GpuArray.array File "pygpu\gpuarray.pyx", line 1405, in pygpu.gpuarray._pygpu_as_ndarray File "pygpu\gpuarray.pyx", line 394, in pygpu.gpuarray.array_read pygpu.gpuarray.GpuArrayException: b'cuMemcpyDtoHAsync(dst, src->ptr + srcoff, sz, ctx->mem_s): CUDA_ERROR_INVALID_VALUE: invalid argument'

The following are my versions and main components

I was hoping to use PyMC3 in an upcoming project, so any assistance you might provide would be much appreciated.

junpenglao commented 6 years ago

We usually dont see a big advantage in using GPU in our use cases, so my suggestion is to set theano to CPU only and try again.

MisterRedactus commented 6 years ago

I tried that by changing the setting in my .theanorc.txt file from 'device = cuda' to 'device = cpu'. This resulted in a new error:

Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (4 chains in 4 jobs) NUTS: [sigma_log__, beta, alpha] WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions. WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions. WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions. WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions. forrtl: error (200): program aborting due to control-C event Image PC Routine Line Source libifcoremd.dll 00007FFB9A6994C4 Unknown Unknown Unknown KERNELBASE.dll 00007FFBD06F717D Unknown Unknown Unknown KERNEL32.DLL 00007FFBD1432784 Unknown Unknown Unknown ntdll.dll 00007FFBD3450C31 Unknown Unknown Unknown forrtl: error (200): program aborting due to control-C event Image PC Routine Line Source libifcoremd.dll 00007FFB9A6994C4 Unknown Unknown Unknown KERNELBASE.dll 00007FFBD06F717D Unknown Unknown Unknown KERNEL32.DLL 00007FFBD1432784 Unknown Unknown Unknown ntdll.dll 00007FFBD3450C31 Unknown Unknown Unknown forrtl: error (200): program aborting due to control-C event Image PC Routine Line Source libifcoremd.dll 00007FFB9A6994C4 Unknown Unknown Unknown KERNELBASE.dll 00007FFBD06F717D Unknown Unknown Unknown KERNEL32.DLL 00007FFBD1432784 Unknown Unknown Unknown ntdll.dll 00007FFBD3450C31 Unknown Unknown Unknown forrtl: error (200): program aborting due to control-C event Image PC Routine Line Source libifcoremd.dll 00007FFB9A6994C4 Unknown Unknown Unknown KERNELBASE.dll 00007FFBD06F717D Unknown Unknown Unknown KERNEL32.DLL 00007FFBD1432784 Unknown Unknown Unknown ntdll.dll 00007FFBD3450C31 Unknown Unknown Unknown ERROR: The process "18224" not found. forrtl: error (200): program aborting due to control-C event Image PC Routine Line Source libifcoremd.dll 00007FFB9A6994C4 Unknown Unknown Unknown KERNELBASE.dll 00007FFBD06F717D Unknown Unknown Unknown KERNEL32.DLL 00007FFBD1432784 Unknown Unknown Unknown ntdll.dll 00007FFBD3450C31 Unknown Unknown Unknown QObject::~QObject: Timers cannot be stopped from another thread ERROR: The process "6244" not found.

Needless to say, I did not initiate any control-C event, so I am just as puzzled with this new error as with the last one, unless there is another way to set theano to use the CPU only.

twiecki commented 6 years ago

Did you install mkl-services in anaconda?

MisterRedactus commented 6 years ago

I don't believe so, unless it was part of the baseline anaconda installation. Unfortunately, I am traveling over the next week and don't have access to my desktop to test this out. Frankly, I have been having enough problems getting a stable PyMC3 installation that I am considering starting over by reinstalling anaconda, then reinstalling PyMC3 from conda while taking careful notes as I go along to document my steps.

However, please see my related Issue 3093 for problems encountered for PyMC3 installation on a different Windows 10 laptop.

MisterRedactus commented 6 years ago

Now that I am back at my home office, I have returned to the problem of installing PyMC3 in Windows 10 in conjunction with Python 3.7/Anaconda 5.2 on my dual-boot GPU-enabled desktop. I have tried a couple of ways of doing this, one consistent with the procedure at http://datahans.blogspot.com/2016/04/installing-pymc3.html (but using Python 3.7, and not 2.7 as recommended in the link), and the other by creating a special conda package in line with one of the suggestions at https://github.com/pymc-devs/pymc3/issues/2988. In the second case, the yml file was:

name: pymc3_env_3_7
dependencies:
  - python 
  - cloudpickle
  - ipykernel
  - mingw
  - libpython
  - m2w64-toolchain
  - mkl=2017
  - pygpu
  - theano
  - pymc3
  - parameterized
  - seaborn

In both cases, I got through the original sampling and MAP estimate portions of the regression case code. And in both cases, I then received this traceback:

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, beta, alpha]
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
ERROR (theano.gpuarray): Could not initialize pygpu, support disabled
Traceback (most recent call last):
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\site-packages\theano\gpuarray\__init__.py", line 227, in <module>
    use(config.device)
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\site-packages\theano\gpuarray\__init__.py", line 214, in use
    init_dev(device, preallocate=preallocate)
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\site-packages\theano\gpuarray\__init__.py", line 65, in init_dev
    raise RuntimeError("You can't initialize the GPU in a subprocess if the parent process already did it")
RuntimeError: You can't initialize the GPU in a subprocess if the parent process already did it
  0%|                                                                                         | 0/5000 [00:00<?, ?it/s]D:\Anaconda3\envs\pymc3_env_3_7\lib\site-packages\scipy\optimize\_minimize.py:381: RuntimeWarning: Method powell does not use gradient information (jac).
  RuntimeWarning)
logp = -149.47, ||grad|| = 13.25: 100%|████████████████████████████████████████████| 177/177 [00:00<00:00, 3326.66it/s]
Map estimate =  {'alpha': array(0.9090678691864014, dtype=float32), 'beta': array([ 0.9514268 ,  2.61449409], dtype=float32), 'sigma_log__': array(-0.03490985184907913, dtype=float32), 'sigma': array(0.9656924605369568, dtype=float32)}
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, beta, alpha]
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\multiprocessing\spawn.py", line 114, in _main
    Traceback (most recent call last):
prepare(preparation_data)
  File "Regression_Case.py", line 51, in <module>
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\multiprocessing\spawn.py", line 225, in prepare
    trace = pm.sample(1000)
_fixup_main_from_path(data['init_main_from_path'])
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\site-packages\pymc3\sampling.py", line 449, in sample
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
    trace = _mp_sample(**sample_args)
run_name="__mp_main__")
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\site-packages\pymc3\sampling.py", line 996, in _mp_sample
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\runpy.py", line 263, in run_path
    chain, progressbar)
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\site-packages\pymc3\parallel_sampling.py", line 275, in __init__
    pkg_name=pkg_name, script_name=fname)
for chain, seed, start in zip(range(chains), seeds, start_points)
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\runpy.py", line 96, in _run_module_code
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\site-packages\pymc3\parallel_sampling.py", line 275, in <listcomp>
    mod_name, mod_spec, pkg_name, script_name)
for chain, seed, start in zip(range(chains), seeds, start_points)
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\runpy.py", line 85, in _run_code
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\site-packages\pymc3\parallel_sampling.py", line 182, in __init__
    exec(code, run_globals)
self._process.start()
  File "D:\Projects\Leak Spill Analysis\Regression_Case.py", line 51, in <module>
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\multiprocessing\process.py", line 105, in start
    trace = pm.sample(1000)
self._popen = self._Popen(self)
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\site-packages\pymc3\sampling.py", line 449, in sample
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\multiprocessing\context.py", line 223, in _Popen
    trace = _mp_sample(**sample_args)
return _default_context.get_context().Process._Popen(process_obj)
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\site-packages\pymc3\sampling.py", line 996, in _mp_sample
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\multiprocessing\context.py", line 322, in _Popen
    chain, progressbar)
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\site-packages\pymc3\parallel_sampling.py", line 275, in __init__
        return Popen(process_obj)for chain, seed, start in zip(range(chains), seeds, start_points)

  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\site-packages\pymc3\parallel_sampling.py", line 275, in <listcomp>
    reduction.dump(process_obj, to_child)
for chain, seed, start in zip(range(chains), seeds, start_points)  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\multiprocessing\reduction.py", line 60, in dump

    ForkingPickler(file, protocol).dump(obj)  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\site-packages\pymc3\parallel_sampling.py", line 182, in __init__

    self._process.start()BrokenPipeError
: [Errno 32] Broken pipe  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\multiprocessing\process.py", line 105, in start

    self._popen = self._Popen(self)
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\multiprocessing\popen_spawn_win32.py", line 33, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
    _check_not_importing_main()
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

For what it is worth, I have run the sample code at http://deeplearning.net/software/theano/tutorial/using_gpu.html confirming that the GPU can be run successfully on my PC. So I am trying other ideas, but as before, any ideas would be appreciated.

MisterRedactus commented 6 years ago

I have continued to try installing PyMC3 on my desktop, and have successfully achieved it by modifying my yml file as follows:

name: pymc3_env_2_7
dependencies:
  - python=2.7
  - cloudpickle
  - ipykernel
  - mingw
  - libpython
  - m2w64-toolchain
  - mkl=2017
  - pygpu
  - theano
  - pymc3
  - parameterized
  - seaborn

This installation using Python 2.7 appears to run successfully using both my CPU and GPU, although I do get an odd "Could not pickle model, sampling singlethreaded." message when I run on the GPU. So in a sense, this installation issue appears to be resolved, since I can now use the PyMC3 app for my work. Now that I've said that, it's worth pointing out that I have never, on any machine, been able to get PyMC3 to successfully install to a Windows 10 platform using the current version of Python 3.

fonnesbeck commented 6 years ago

Glad you got something working. It is worrying that you can't get Py3 to work, however. To clarify, are you talking about working with a GPU, or working at all? Does it work in a CPU environment?

MisterRedactus commented 6 years ago

The traceback I posted a couple of days ago was with device = cuda. Changing this to device = cpu in my .theanorc.txt file results in:

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, beta, alpha]
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
  0%|                                                                                         | 0/5000 [00:00<?, ?it/s]D:\Anaconda3\envs\pymc3_env_3_7\lib\site-packages\scipy\optimize\_minimize.py:381: RuntimeWarning: Method powell does not use gradient information (jac).
  RuntimeWarning)
logp = -149.47, ||grad|| = 13.25: 100%|████████████████████████████████████████████| 177/177 [00:00<00:00, 4748.09it/s]
Map estimate =  {'alpha': array(0.9090678691864014, dtype=float32), 'beta': array([ 0.9514268 ,  2.61449409], dtype=float32), 'sigma_log__': array(-0.03490985184907913, dtype=float32), 'sigma': array(0.9656924605369568, dtype=float32)}
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, beta, alpha]
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\multiprocessing\spawn.py", line 114, in _main
    prepare(preparation_data)
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\multiprocessing\spawn.py", line 225, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
    run_name="__mp_main__")
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
Traceback (most recent call last):
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\runpy.py", line 96, in _run_module_code
  File "Regression_Case.py", line 57, in <module>
        mod_name, mod_spec, pkg_name, script_name)trace = pm.sample(1000)

  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\runpy.py", line 85, in _run_code
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\site-packages\pymc3\sampling.py", line 449, in sample
        exec(code, run_globals)trace = _mp_sample(**sample_args)

  File "D:\Projects\Leak Spill Analysis\Regression_Case.py", line 57, in <module>
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\site-packages\pymc3\sampling.py", line 996, in _mp_sample
        trace = pm.sample(1000)chain, progressbar)

  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\site-packages\pymc3\sampling.py", line 449, in sample
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\site-packages\pymc3\parallel_sampling.py", line 275, in __init__
    trace = _mp_sample(**sample_args)
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\site-packages\pymc3\sampling.py", line 996, in _mp_sample
    chain, progressbar)
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\site-packages\pymc3\parallel_sampling.py", line 275, in __init__
        for chain, seed, start in zip(range(chains), seeds, start_points)for chain, seed, start in zip(range(chains), seeds, start_points)

  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\site-packages\pymc3\parallel_sampling.py", line 275, in <listcomp>
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\site-packages\pymc3\parallel_sampling.py", line 275, in <listcomp>
        for chain, seed, start in zip(range(chains), seeds, start_points)for chain, seed, start in zip(range(chains), seeds, start_points)

  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\site-packages\pymc3\parallel_sampling.py", line 182, in __init__
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\site-packages\pymc3\parallel_sampling.py", line 182, in __init__
        self._process.start()self._process.start()

  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\multiprocessing\process.py", line 105, in start
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\multiprocessing\process.py", line 105, in start
        self._popen = self._Popen(self)self._popen = self._Popen(self)

  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\multiprocessing\context.py", line 223, in _Popen
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
return _default_context.get_context().Process._Popen(process_obj)  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\multiprocessing\context.py", line 322, in _Popen

      File "D:\Anaconda3\envs\pymc3_env_3_7\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
      File "D:\Anaconda3\envs\pymc3_env_3_7\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
return Popen(process_obj)
  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\multiprocessing\popen_spawn_win32.py", line 33, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
      File "D:\Anaconda3\envs\pymc3_env_3_7\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
reduction.dump(process_obj, to_child)
_check_not_importing_main()  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\multiprocessing\reduction.py", line 60, in dump

  File "D:\Anaconda3\envs\pymc3_env_3_7\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
    ForkingPickler(file, protocol).dump(obj)
BrokenPipeError: [Errno 32] Broken pipe
twiecki commented 6 years ago

I think I know why. Py3 properly supports parallel sampling, which would only work if you had 4 GPUs. So try setting jobs=1.

MisterRedactus commented 6 years ago

Awesome! That did it. The sample case runs in the Python 3 PyMC3 environment with both device = cuda and device = cpu in .theanorc.txt. I note that the CPU run takes about a quarter the time that the GPU run takes, so it would be hard to justify doing this using my GPU.

My only other question: Why did the Python 2 installation work properly? Beyond that, it would be useful to include more bulletproof instructions for installing PyMC3 under Windows.

Other than that, I think my PyMC3 installation looks good. Thanks so much for the assist.

twiecki commented 6 years ago

Glad it's working. Yes, GPU only speeds up very few models. There's probably more optimization that could be done, however, but it's not a priority currently.

The support for parallel sampling is a bit broken in python 2 so I assume that you just didn't get parallelization there and thus theano didn't try to run 4 GPU jobs in parallel.

Happy sampling!

JIXING123 commented 6 years ago

hi, I also met the same problem. I still do not understand how to solve this problem as this is the first time I run PyMC3? Can you write it down more detail? thanks

twiecki commented 6 years ago

@JIXING123 Did you try sampling with jobs=1? I.e. pm.sample(jobs=1).

JIXING123 commented 6 years ago

Hi, thank you for your reply. I do not set jobs=1 as I do not know where should I put this code. For example, trace=pm.sample(2000, jobs=1) or set pm.sample(jobs=1)as an independent line?

Attachment is the code from http://people.duke.edu/~ccc14/sta-663-2016/16C_PyMC3.html, which is just used for learning PyMC3. I also try the example which MisterRedacts did. They are the same error.

import pymc3 as pm import numpy as np import scipy.stats as stats import matplotlib.pyplot as plt from sklearn.preprocessing import StandardScaler

n=100 heads=61 a,b=10,10 prior=stats.beta(a,b) post=stats.beta(heads+a, n-heads+b) ci=post.interval(0.95)

xs= np.linspace(0,1,100) plt.plot(prior.pdf(xs),label='Prior') plt.plot(post.pdf(xs),label='Posterior') plt.axvline(100*heads/n, c='red', alpha=0.4, label='MLE') plt.xlim([0, 100]) plt.axhline(0.3,ci[0],ci[1], c='black', linewidth=2, label='95% CI') plt.legend() pass

introduction PyMC3

niter=2000 with pm.Model() as coin_context: p=pm.Beta('p',alpha=2, beta=2) y=pm.Binomial('y', n=n, p=p, observed=heads) trace=pm.sample(niter,jobs=1)

MisterRedactus commented 6 years ago

Try the attached procedure, environment config and test PyMC3 Python files, which worked well on my Windows PC:

PyMC3 Windows Installation Instructions.docx

Python3_Regression_Case.txt

pymc3_env_3_7.txt

Make sure you change pymc3_env_3_7.txt to pymc3_env_3_7.yml and Python3_Regression_Case.txt to Python3_Regression_Case.py before you start. Good luck.

junpenglao commented 6 years ago

Wow thank you so much for writing down your experience!

JWarmenhoven commented 6 years ago

@JIXING123 For me (PyMC 3.5, single GPU) your code works with both CPU and GPU. It is just that with a single GPU you have to make sure Theano does not try to initiate parallel sampling as indicated by @twiecki above.

In case you are using a single GPU: did you try setting cores=1 when sampling? Your example code: trace=pm.sample(niter, cores=1).

Looks like below commit replaced keyword njobs with cores in pm.sample? https://github.com/pymc-devs/pymc3/commit/f74bf07be3172af92bad5ce30ce80b09710d6704#diff-7eb6c4a83cfe45b9fc0eac76b57e2175

benmbrennan commented 6 years ago

Hey, I'm also having troubles with this and none of the suggested solutions work. I've changed the device to be 'cpu' already as well.

`model=pymc3.Model()'

'with model:'

'alpha = pymc3.MvNormal('alpha', mu=np.r_[np.ones(5),.9*np.eye卌.flatten('F')], cov=np.eye(30), shape=(30,))'
'mu=theano.tensor.dot(theano.tensor.slinalg.kron(np.eye(5),X),alpha.T)'
'Y_obs = pymc3.MvNormal('Y_obs', mu=mu.T, cov=np.eye(1260), observed=Y.flatten('F').T)'
'map_estimate=pymc3.find_MAP(model=model)'
'trace = pymc3.sample(500, jobs=1)`

I had been trying to draw the covariance matrix for the parameters from a distribution as well, but was having a lot of trouble getting that to work and wanted to just make sure I could get something simpler to work first.

X and Y are time series data matrices. X includes the lags of Y (and ones).

junpenglao commented 6 years ago

pretty sure you will run out of memory with cov=np.eye(1260). If you are using an identity matrix as cov then it is just a univariate Gaussian - you should try to replace the MvNormal with a Normal

benmbrennan commented 6 years ago

I'm using identity matrices because I was being bombarded with errors when I was drawing from another distribution. Also is the cov for the observations not meant to be the covariance matrix for the error terms?

junpenglao commented 6 years ago

If you have error also in CPU, this is likely a different issue and you should open a new issue or discussion on https://discourse.pymc.io. Did you do a search on our discourse? I remember using sparse matrix is not trivial and there are a few discussion there.

cpoptic commented 6 years ago

I was able to solve this issue in my environment (a single GPU laptop) by adding the parameter "cores=1" to my trace call.

So for example: trace = pm.sample(2000, step=step) would be modified to trace = pm.sample(2000, step=step, cores=1)