pymc-devs / pymc

Bayesian Modeling and Probabilistic Programming in Python
https://docs.pymc.io/
Other
8.69k stars 2.01k forks source link

Multiprocessing fails when sampling multiple chains using multiple cores #3140

Closed JackCaster closed 2 years ago

JackCaster commented 6 years ago

Since PR https://github.com/pymc-devs/pymc3/pull/3011 I have been having troubles sampling multiple chains with multiple cores. In Jupyter notebook I get random kernel shutdowns and therefore I haven't managed to pinpoint what is the problem (it seems that the more complicated the model is, the higher the crash rate). However, I found a systematic issue when using the python interpreter only (not the Jupyter kernel): if I sample more than one chain using more than 1 core (say, 2 chains and 2 cores) Python crashes. Sampling multiple chains with 1 core, or 1 chain with multiple cores is fine. On a Jupyter notebook I do not encounter any problems.

The minimal example is attached (please run it as a script, and not on a Jupyter kernel):

import numpy as np
import pandas as pd

import theano
import pymc3 as pm

print('*** Start script ***')
print(f'{pm.__name__}: v. {pm.__version__}')
print(f'{theano.__name__}: v. {theano.__version__}')

SEED = 20180730
np.random.seed(SEED)

# Generate data
mu_real = 0
sd_real = 1
n_samples = 1000
y = np.random.normal(loc=mu_real, scale=sd_real, size=n_samples)

# Bayesian modelling
with pm.Model() as model:

    mu = pm.Normal('mu', mu=0, sd=10)
    sd = pm.HalfNormal('sd', sd=10)

    # Likelihood
    likelihood = pm.Normal('likelihood', mu=mu, sd=sd, observed=y)    
    trace = pm.sample(chains=2, cores=2, random_seed=SEED)

print('Done!')

Running with chains=2 and cores=2 throws the error:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 114, in _main
    prepare(preparation_data)
  File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 225, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
Traceback (most recent call last):
  File "test_multicore_multichain.py", line 28, in <module>
    run_name="__mp_main__")
trace = pm.sample(chains=2, cores=2, random_seed=SEED)  File "C:\Miniconda3\envs\bayes\lib\runpy.py", line 263, in run_path

  File "d:\dev\pymc3\pymc3\sampling.py", line 451, in sample
    pkg_name=pkg_name, script_name=fname)
  File "C:\Miniconda3\envs\bayes\lib\runpy.py", line 96, in _run_module_code
    trace = _mp_sample(**sample_args)
      File "d:\dev\pymc3\pymc3\sampling.py", line 998, in _mp_sample
mod_name, mod_spec, pkg_name, script_name)
  File "C:\Miniconda3\envs\bayes\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\moran\Desktop\test_multicore_multichain.py", line 28, in <module>
    chain, progressbar)
trace = pm.sample(chains=2, cores=2, random_seed=SEED)  File "d:\dev\pymc3\pymc3\parallel_sampling.py", line 275, in __init__

  File "d:\dev\pymc3\pymc3\sampling.py", line 451, in sample
    for chain, seed, start in zip(range(chains), seeds, start_points)
  File "d:\dev\pymc3\pymc3\parallel_sampling.py", line 275, in <listcomp>
    trace = _mp_sample(**sample_args)
for chain, seed, start in zip(range(chains), seeds, start_points)  File "d:\dev\pymc3\pymc3\sampling.py", line 998, in _mp_sample

  File "d:\dev\pymc3\pymc3\parallel_sampling.py", line 182, in __init__
    self._process.start()
  File "C:\Miniconda3\envs\bayes\lib\multiprocessing\process.py", line 105, in start
    chain, progressbar)
self._popen = self._Popen(self)  File "d:\dev\pymc3\pymc3\parallel_sampling.py", line 275, in __init__

  File "C:\Miniconda3\envs\bayes\lib\multiprocessing\context.py", line 223, in _Popen
    for chain, seed, start in zip(range(chains), seeds, start_points)
      File "d:\dev\pymc3\pymc3\parallel_sampling.py", line 275, in <listcomp>
return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Miniconda3\envs\bayes\lib\multiprocessing\context.py", line 322, in _Popen
    for chain, seed, start in zip(range(chains), seeds, start_points)
      File "d:\dev\pymc3\pymc3\parallel_sampling.py", line 182, in __init__
return Popen(process_obj)
  File "C:\Miniconda3\envs\bayes\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
    self._process.start()
reduction.dump(process_obj, to_child)  File "C:\Miniconda3\envs\bayes\lib\multiprocessing\process.py", line 105, in start

  File "C:\Miniconda3\envs\bayes\lib\multiprocessing\reduction.py", line 60, in dump
    self._popen = self._Popen(self)
ForkingPickler(file, protocol).dump(obj)  File "C:\Miniconda3\envs\bayes\lib\multiprocessing\context.py", line 223, in _Popen

BrokenPipeError:     [Errno 32] Broken pipereturn _default_context.get_context().Process._Popen(process_obj)

  File "C:\Miniconda3\envs\bayes\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "C:\Miniconda3\envs\bayes\lib\multiprocessing\popen_spawn_win32.py", line 33, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
    _check_not_importing_main()
  File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

The interesting thing is that the print statements in the script are duplicated (which does not happen when chains=2 and cores=1, or chains=1 and cores=2)

*** Start script ***
pymc3: v. 3.5
theano: v. 1.0.2
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [sd, mu]
*** Start script ***
pymc3: v. 3.5
theano: v. 1.0.2
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [sd, mu]

I am on master on both PyMC3 and Theano.

junpenglao commented 6 years ago

Possible Windows related... @aseyboldt

aseyboldt commented 6 years ago

Yes, this looks like an issue with multiprocessing on windows.

Can you try this:

import numpy as np
import pandas as pd

import theano
import pymc3 as pm

print('*** Start script ***')
print(f'{pm.__name__}: v. {pm.__version__}')
print(f'{theano.__name__}: v. {theano.__version__}')

if __name__ == '__main__':
    SEED = 20180730
    np.random.seed(SEED)

    # Generate data
    mu_real = 0
    sd_real = 1
    n_samples = 1000
    y = np.random.normal(loc=mu_real, scale=sd_real, size=n_samples)

    # Bayesian modelling
    with pm.Model() as model:

        mu = pm.Normal('mu', mu=0, sd=10)
        sd = pm.HalfNormal('sd', sd=10)

        # Likelihood
        likelihood = pm.Normal('likelihood', mu=mu, sd=sd, observed=y)    
        trace = pm.sample(chains=2, cores=2, random_seed=SEED)

    print('Done!')

But I don't really understand why it has trouble in the notebook. Can you post the versions of pyzmq, jupyter and ipython?

JackCaster commented 6 years ago

If I use the if statement then the sampling works. Still, the print statements are executed multiple times:

*** Start script ***
pymc3: v. 3.5
theano: v. 1.0.2
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [sd, mu]
*** Start script ***
pymc3: v. 3.5
theano: v. 1.0.2
*** Start script ***
pymc3: v. 3.5
theano: v. 1.0.2
Sampling 2 chains: 100%|███████████████████████████████████████████████████████████████████| 2000/2000 [00:02<00:00, 724.48draws/s] Done!

Comment on the Jupyter notebook.

This particular script runs fine on Jupyter notebook (I crashed 1 time only after several attempts). In general, however, the sampling with multiple cores got very unreliable. I have some more complicated models that won't run with multiple cores (in a fresh installed environment). For example, one notebook I am working on now (a softmax regression) crashes continuously when using multiple cores:


Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 4 jobs)
NUTS: [beta_]
Sampling 2 chains:   0%|                                                                               | 0/8000 [00:00<?, ?draws/s]

forrtl: error (200): program aborting due to control-C event
Image              PC                Routine            Line        Source
libifcoremd.dll    00007FFC9C7F94C4  Unknown               Unknown  Unknown
KERNELBASE.dll     00007FFCD18B56FD  Unknown               Unknown  Unknown
KERNEL32.DLL       00007FFCD38E3034  Unknown               Unknown  Unknown
ntdll.dll          00007FFCD4A11431  Unknown               Unknown  Unknown
forrtl: error (200): program aborting due to control-C event
Image              PC                Routine            Line        Source
libifcoremd.dll    00007FFC9C7F94C4  Unknown               Unknown  Unknown
KERNELBASE.dll     00007FFCD18B56FD  Unknown               Unknown  Unknown
KERNEL32.DLL       00007FFCD38E3034  Unknown               Unknown  Unknown
ntdll.dll          00007FFCD4A11431  Unknown               Unknown  Unknown
forrtl: error (200): program aborting due to control-C event
Image              PC                Routine            Line        Source
libifcoremd.dll    00007FFC9C7F94C4  Unknown               Unknown  Unknown
KERNELBASE.dll     00007FFCD18B56FD  Unknown               Unknown  Unknown
KERNEL32.DLL       00007FFCD38E3034  Unknown               Unknown  Unknown
ntdll.dll          00007FFCD4A11431  Unknown               Unknown  Unknown
forrtl: error (200): program aborting due to control-C event
Image              PC                Routine            Line        Source
libifcoremd.dll    00007FFC9C7F94C4  Unknown               Unknown  Unknown
KERNELBASE.dll     00007FFCD18B56FD  Unknown               Unknown  Unknown
KERNEL32.DLL       00007FFCD38E3034  Unknown               Unknown  Unknown
ntdll.dll          00007FFCD4A11431  Unknown               Unknown  Unknown
[I 14:26:40.033 NotebookApp] Interrupted...
[I 14:26:40.033 NotebookApp] Shutting down 2 kernels
[I 14:26:40.135 NotebookApp] Kernel shutdown: eaa60eb4-6bae-4c91-82bf-6bd5648ddf35
[I 14:26:40.135 NotebookApp] Kernel shutdown: e41f13f3-e731-4812-8130-97a7a6220fd7

If I run the softmax regression script as python script (without the if __name__ == '__main__':) I get the error

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 4 jobs)
NUTS: [beta_]
3.5
1.0.2
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 4 jobs)
NUTS: [beta_]
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 114, in _main
    prepare(preparation_data)
  File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 225, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
Traceback (most recent call last):
  File "test_softmax_multicore.py", line 38, in <module>
    run_name="__mp_main__")
      File "C:\Miniconda3\envs\bayes\lib\runpy.py", line 263, in run_path
trace = pm.sample(draws=3000, tune=1000, chains=2, cores=4, random_seed=SEED)
  File "d:\dev\pymc3\pymc3\sampling.py", line 451, in sample
    pkg_name=pkg_name, script_name=fname)
  File "C:\Miniconda3\envs\bayes\lib\runpy.py", line 96, in _run_module_code
    trace = _mp_sample(**sample_args)
      File "d:\dev\pymc3\pymc3\sampling.py", line 998, in _mp_sample
mod_name, mod_spec, pkg_name, script_name)
  File "C:\Miniconda3\envs\bayes\lib\runpy.py", line 85, in _run_code
        exec(code, run_globals)chain, progressbar)

  File "D:\dev\GLM_with_PyMC3\notebooks\test_softmax_multicore.py", line 38, in <module>
  File "d:\dev\pymc3\pymc3\parallel_sampling.py", line 275, in __init__
    trace = pm.sample(draws=3000, tune=1000, chains=2, cores=4, random_seed=SEED)
  File "d:\dev\pymc3\pymc3\sampling.py", line 451, in sample
    for chain, seed, start in zip(range(chains), seeds, start_points)
  File "d:\dev\pymc3\pymc3\parallel_sampling.py", line 275, in <listcomp>
    trace = _mp_sample(**sample_args)
  File "d:\dev\pymc3\pymc3\sampling.py", line 998, in _mp_sample
    for chain, seed, start in zip(range(chains), seeds, start_points)
  File "d:\dev\pymc3\pymc3\parallel_sampling.py", line 182, in __init__
        self._process.start()chain, progressbar)

  File "C:\Miniconda3\envs\bayes\lib\multiprocessing\process.py", line 105, in start
  File "d:\dev\pymc3\pymc3\parallel_sampling.py", line 275, in __init__
    self._popen = self._Popen(self)
for chain, seed, start in zip(range(chains), seeds, start_points)  File "C:\Miniconda3\envs\bayes\lib\multiprocessing\context.py", line 223, in _Popen

  File "d:\dev\pymc3\pymc3\parallel_sampling.py", line 275, in <listcomp>
    return _default_context.get_context().Process._Popen(process_obj)
      File "C:\Miniconda3\envs\bayes\lib\multiprocessing\context.py", line 322, in _Popen
for chain, seed, start in zip(range(chains), seeds, start_points)
  File "d:\dev\pymc3\pymc3\parallel_sampling.py", line 182, in __init__
    return Popen(process_obj)
      File "C:\Miniconda3\envs\bayes\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
self._process.start()
  File "C:\Miniconda3\envs\bayes\lib\multiprocessing\process.py", line 105, in start
    reduction.dump(process_obj, to_child)
      File "C:\Miniconda3\envs\bayes\lib\multiprocessing\reduction.py", line 60, in dump
self._popen = self._Popen(self)
      File "C:\Miniconda3\envs\bayes\lib\multiprocessing\context.py", line 223, in _Popen
ForkingPickler(file, protocol).dump(obj)
    BrokenPipeErrorreturn _default_context.get_context().Process._Popen(process_obj):
[Errno 32] Broken pipe  File "C:\Miniconda3\envs\bayes\lib\multiprocessing\context.py", line 322, in _Popen

    return Popen(process_obj)
  File "C:\Miniconda3\envs\bayes\lib\multiprocessing\popen_spawn_win32.py", line 33, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
    _check_not_importing_main()
  File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

If I wrap the script into if __name__ == '__main__': I get the error

sampling 2 chains:   0%|                                                                               | 0/8000 [00:00<?, ?draws/s] You can find the C code in this temporary file: C:\Users\moran\AppData\Local\Temp\theano_compilation_error__a0g2s_m
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
  File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\compile\function_module.py", line 1082, in _constructor_Function
    f = maker.create(input_storage, trustme=True)
  File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\compile\function_module.py", line 1715, in create
    input_storage=input_storage_lists, storage_map=storage_map)
  File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\link.py", line 699, in make_thunk
    storage_map=storage_map)[:3]
  File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\vm.py", line 1091, in make_all
    impl=impl))
  File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\op.py", line 955, in make_thunk
    no_recycling)
  File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\op.py", line 858, in make_c_thunk
    output_storage=node_output_storage)
  File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cc.py", line 1217, in make_thunk
    keep_lock=keep_lock)
  File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cc.py", line 1157, in __compile__
    keep_lock=keep_lock)
  File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cc.py", line 1620, in cthunk_factory
    key=key, lnk=self, keep_lock=keep_lock)
  File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cmodule.py", line 1181, in module_from_key
    module = lnk.compile_cmodule(location)
  File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cc.py", line 1523, in compile_cmodule
    preargs=preargs)
  File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cmodule.py", line 2388, in compile_str
    (status, compile_stderr.replace('\n', '. ')))
Exception: ('The following error happened while compiling the node', Softmax(Dot22.0), '\n', 'Compilation failed (return status=3): ', '[Softmax(<TensorType(float64, matrix)>)]')
forrtl: error (200): program aborting due to control-C event
Image              PC                Routine            Line        Source
libifcoremd.dll    00007FFC98B294C4  Unknown               Unknown  Unknown
KERNELBASE.dll     00007FFCD18B56FD  Unknown               Unknown  Unknown
KERNEL32.DLL       00007FFCD38E3034  Unknown               Unknown  Unknown
ntdll.dll          00007FFCD4A11431  Unknown               Unknown  Unknown
forrtl: error (200): program aborting due to control-C event
Image              PC                Routine            Line        Source
libifcoremd.dll    00007FFC98B294C4  Unknown               Unknown  Unknown
KERNELBASE.dll     00007FFCD18B56FD  Unknown               Unknown  Unknown
KERNEL32.DLL       00007FFCD38E3034  Unknown               Unknown  Unknown
ntdll.dll          00007FFCD4A11431  Unknown               Unknown  Unknown
aseyboldt commented 6 years ago

So it seems that there are two issues here:

I'm trying to reproduce this locally, can you send me an example that fails with the second error? What is the output of np.__config__.show()?

I have some vague ideas where this might be coming from, and if my hunch is right, setting one of OMP_NUM_THREADS=1, MKL_THREADING_LAYER=sequential or MKL_THREADING_LAYER=GNU might help. To do that, execute

import os
# one of
os.environ['MKL_THREADING_LAYER'] = 'sequential'
os.environ['OMP_NUM_THREADS'] = '1'
os.environ['MKL_THREADING_LAYER'] = 'GNU'

before you import anything else.

And thank you for reporting this :-)

JackCaster commented 6 years ago

The np.__config__.show() outputs:

mkl_info:
    libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
    library_dirs = ['C:/Miniconda3/envs/bayes\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\include', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\lib', 'C:/Miniconda3/envs/bayes\\Library\\include']
blas_mkl_info:
    libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
    library_dirs = ['C:/Miniconda3/envs/bayes\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\include', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\lib', 'C:/Miniconda3/envs/bayes\\Library\\include']
blas_opt_info:
    libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
    library_dirs = ['C:/Miniconda3/envs/bayes\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\include', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\lib', 'C:/Miniconda3/envs/bayes\\Library\\include']
lapack_mkl_info:
    libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
    library_dirs = ['C:/Miniconda3/envs/bayes\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\include', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\lib', 'C:/Miniconda3/envs/bayes\\Library\\include']
lapack_opt_info:
    libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
    library_dirs = ['C:/Miniconda3/envs/bayes\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\include', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\lib', 'C:/Miniconda3/envs/bayes\\Library\\include']

I tried to set the environment variables but it does not solve the issue, unfortunately. I have attached the Jupyter notebook (+data) that keeps crashing on my side. It is based on the softmax regression on DBDA2 book. test_softmax_multicore.zip

Thank you for looking into this.

aseyboldt commented 6 years ago

It works for me, but I have a different blas installed. How did you install python/numpy/pymc3?

aseyboldt commented 6 years ago

Can you maybe also post the output of pip freeze and conda list?

JackCaster commented 6 years ago

I installed numpy (and scipy and all the PyMC3 dependencies) via conda (because it links the packages to the mkl library). Then I installed Theano and PyMC3 via pip git.

Conda list

conda list
# packages in environment at C:\Miniconda3\envs\bayes:
#
# Name                    Version                   Build  Channel
backcall                  0.1.0                    py36_0
blas                      1.0                         mkl
bleach                    2.1.3                    py36_0
ca-certificates           2018.03.07                    0
certifi                   2018.4.16                py36_0
colorama                  0.3.9            py36h029ae33_0
cycler                    0.10.0           py36h009560c_0
cython                    0.28.3           py36hfa6e2cd_0
decorator                 4.3.0                    py36_0
entrypoints               0.2.3            py36hfd66bb0_2
freetype                  2.8                  h51f8f2c_1
h5py                      2.8.0                     <pip>
html5lib                  1.0.1            py36h047fa9f_0
icc_rt                    2017.0.4             h97af966_0
icu                       58.2                 ha66f8fd_1
intel-openmp              2018.0.3                      0
ipykernel                 4.8.2                    py36_0
ipython                   6.4.0                    py36_0
ipython_genutils          0.2.0            py36h3c5d0ee_0
ipywidgets                7.2.1                    py36_0
jedi                      0.12.0                   py36_1
jinja2                    2.10             py36h292fed1_0
joblib                    0.12.0                    <pip>
jpeg                      9b                   hb83a4c4_2
jsonschema                2.6.0            py36h7636477_0
jupyter                   1.0.0                    py36_4
jupyter_client            5.2.3                    py36_0
jupyter_console           5.2.0            py36h6d89b47_1
jupyter_core              4.4.0            py36h56e9d50_0
kiwisolver                1.0.1            py36h12c3424_0
libpng                    1.6.34               h79bbb47_0
libpython                 2.1                      py36_0
libsodium                 1.0.16               h9d3ae62_0
m2w64-binutils            2.25.1                        5    msys2
m2w64-bzip2               1.0.6                         6    msys2
m2w64-crt-git             5.0.0.4636.2595836               2    msys2
m2w64-gcc                 5.3.0                         6    msys2
m2w64-gcc-ada             5.3.0                         6    msys2
m2w64-gcc-fortran         5.3.0                         6    msys2
m2w64-gcc-libgfortran     5.3.0                         6    msys2
m2w64-gcc-libs            5.3.0                         7    msys2
m2w64-gcc-libs-core       5.3.0                         7    msys2
m2w64-gcc-objc            5.3.0                         6    msys2
m2w64-gmp                 6.1.0                         2    msys2
m2w64-headers-git         5.0.0.4636.c0ad18a               2    msys2
m2w64-isl                 0.16.1                        2    msys2
m2w64-libiconv            1.14                          6    msys2
m2w64-libmangle-git       5.0.0.4509.2e5a9a2               2    msys2
m2w64-libwinpthread-git   5.0.0.4634.697f757               2    msys2
m2w64-make                4.1.2351.a80a8b8               2    msys2
m2w64-mpc                 1.0.3                         3    msys2
m2w64-mpfr                3.1.4                         4    msys2
m2w64-pkg-config          0.29.1                        2    msys2
m2w64-toolchain           5.3.0                         7    msys2
m2w64-tools-git           5.0.0.4592.90b8472               2    msys2
m2w64-windows-default-manifest 6.4                           3    msys2
m2w64-winpthreads-git     5.0.0.4634.697f757               2    msys2
m2w64-zlib                1.2.8                        10    msys2
markupsafe                1.0              py36h0e26971_1
matplotlib                2.2.2            py36h153e9ff_1
mistune                   0.8.3            py36hfa6e2cd_1
mkl                       2018.0.3                      1
mkl-service               1.1.2            py36h57e144c_4
mkl_fft                   1.0.1            py36h452e1ab_0
mkl_random                1.0.1            py36h9258bd6_0
msys2-conda-epoch         20160418                      1    msys2
nbconvert                 5.3.1            py36h8dc0fde_0
nbformat                  4.4.0            py36h3a5bc1b_0
notebook                  5.5.0                    py36_0
numpy                     1.14.3           py36h9fa60d3_2
numpy-base                1.14.3           py36h5c71026_2
openssl                   1.0.2o               h8ea7d77_0
pandas                    0.23.1           py36h830ac7b_0
pandoc                    2.2.1                h1a437c5_0
pandocfilters             1.4.2            py36h3ef6317_1
parso                     0.2.1                    py36_0
patsy                     0.5.0                    py36_0
pickleshare               0.7.4            py36h9de030f_0
pip                       10.0.1                   py36_0
prompt_toolkit            1.0.15           py36h60b8f86_0
pygments                  2.2.0            py36hb010967_0
pymc3                     3.4.1                     <pip>
pyparsing                 2.2.0            py36h785a196_1
pyqt                      5.9.2            py36h1aa27d4_0
python                    3.6.6                hea74fb7_0
python-dateutil           2.7.3                    py36_0
pytz                      2018.4                   py36_0
pywinpty                  0.5.4                    py36_0
pyzmq                     17.0.0           py36hfa6e2cd_1
qt                        5.9.6            vc14h62aca36_0  [vc14]
qtconsole                 4.3.1            py36h99a29a9_0
scipy                     1.1.0            py36h672f292_0
seaborn                   0.8.1            py36h9b69545_0
send2trash                1.5.0                    py36_0
setuptools                39.2.0                   py36_0
simplegeneric             0.8.1                    py36_2
sip                       4.19.8           py36h6538335_0
six                       1.11.0           py36h4db2310_1
sqlite                    3.24.0               h7602738_0
statsmodels               0.9.0            py36h452e1ab_0
terminado                 0.8.1                    py36_1
testpath                  0.3.1            py36h2698cfe_0
Theano                    1.0.2+26.gd0420e3d9           <pip>
tornado                   5.0.2                    py36_0
tqdm                      4.23.4                    <pip>
traitlets                 4.3.2            py36h096827d_0
vc                        14                   h0510ff6_3
vs2015_runtime            14.0.25123                    3
wcwidth                   0.1.7            py36h3d5aa90_0
webencodings              0.5.1            py36h67c50ae_1
wheel                     0.31.1                   py36_0
widgetsnbextension        3.2.1                    py36_0
wincertstore              0.2              py36h7fe50ca_0
winpty                    0.4.3                         4
zeromq                    4.2.5                hc6251cf_0
zlib                      1.2.11               h8395fce_2

pip freeze

backcall==0.1.0
bleach==2.1.3
certifi==2018.4.16
colorama==0.3.9
cycler==0.10.0
Cython==0.28.3
decorator==4.3.0
entrypoints==0.2.3
h5py==2.8.0
html5lib==1.0.1
ipykernel==4.8.2
ipython==6.4.0
ipython-genutils==0.2.0
ipywidgets==7.2.1
jedi==0.12.0
Jinja2==2.10
joblib==0.12.0
jsonschema==2.6.0
jupyter==1.0.0
jupyter-client==5.2.3
jupyter-console==5.2.0
jupyter-core==4.4.0
kiwisolver==1.0.1
MarkupSafe==1.0
matplotlib==2.2.2
mistune==0.8.3
mkl-fft==1.0.0
mkl-random==1.0.1
nbconvert==5.3.1
nbformat==4.4.0
notebook==5.5.0
numpy==1.14.3
pandas==0.23.1
pandocfilters==1.4.2
parso==0.2.1
patsy==0.5.0
pickleshare==0.7.4
prompt-toolkit==1.0.15
Pygments==2.2.0
-e git+https://github.com/JackCaster/pymc3.git@98545be7ddad700b5fb02be2893d2fedae22c110#egg=pymc3
pyparsing==2.2.0
python-dateutil==2.7.3
pytz==2018.4
pywinpty==0.5.4
pyzmq==17.0.0
qtconsole==4.3.1
scipy==1.1.0
seaborn==0.8.1
Send2Trash==1.5.0
simplegeneric==0.8.1
six==1.11.0
statsmodels==0.9.0
terminado==0.8.1
testpath==0.3.1
Theano==1.0.2+26.gd0420e3d9
tornado==5.0.2
tqdm==4.23.4
traitlets==4.3.2
wcwidth==0.1.7
webencodings==0.5.1
widgetsnbextension==3.2.1
wincertstore==0.2
JackCaster commented 6 years ago

I did some digging. I found out that the error forrtl: error (200): program aborting due to control-C event that makes the kernel crash is not unusual (see here). In the comments, they suggest to set the environment variable FOR_DISABLE_CONSOLE_CTRL_HANDLER to "1" or "T". I did so, and when the notebook crashes (because it still does ;( ), the traceback is:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
  File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\compile\function_module.py", line 1082, in _constructor_Function
    f = maker.create(input_storage, trustme=True)
  File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\compile\function_module.py", line 1715, in create
Traceback (most recent call last):
  File "<string>", line 1, in <module>
    input_storage=input_storage_lists, storage_map=storage_map)
  File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\link.py", line 699, in make_thunk
  File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
      File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 115, in _main
storage_map=storage_map)[:3]    self = reduction.pickle.load(from_parent)

  File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\vm.py", line 1091, in make_all
  File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\compile\function_module.py", line 1082, in _constructor_Function
    impl=impl))
f = maker.create(input_storage, trustme=True)  File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\op.py", line 955, in make_thunk

  File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\compile\function_module.py", line 1715, in create
    no_recycling)
      File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\op.py", line 858, in make_c_thunk
input_storage=input_storage_lists, storage_map=storage_map)
  File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\link.py", line 699, in make_thunk
        output_storage=node_output_storage)storage_map=storage_map)[:3]

  File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cc.py", line 1217, in make_thunk
  File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\vm.py", line 1091, in make_all
    impl=impl))
  File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\op.py", line 955, in make_thunk
    keep_lock=keep_lock)
  File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cc.py", line 1157, in __compile__
    no_recycling)
  File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\op.py", line 858, in make_c_thunk
    keep_lock=keep_lock)
      File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cc.py", line 1620, in cthunk_factory
output_storage=node_output_storage)
  File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cc.py", line 1217, in make_thunk
    key=key, lnk=self, keep_lock=keep_lock)
  File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cmodule.py", line 1151, in module_from_key
    keep_lock=keep_lock)
  File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cc.py", line 1157, in __compile__
    with compilelock.lock_ctx(keep_lock=keep_lock):
  File "C:\Miniconda3\envs\bayes\lib\contextlib.py", line 81, in __enter__
        keep_lock=keep_lock)return next(self.gen)

  File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cc.py", line 1620, in cthunk_factory
  File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\compilelock.py", line 40, in lock_ctx
    get_lock(lock_dir=lock_dir, **kw)
  File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\compilelock.py", line 86, in _get_lock
    lock(get_lock.lock_dir, **kw)
key=key, lnk=self, keep_lock=keep_lock)  File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\compilelock.py", line 273, in lock

  File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cmodule.py", line 1181, in module_from_key
    time.sleep(random.uniform(min_wait, max_wait))
KeyboardInterrupt
    module = lnk.compile_cmodule(location)
  File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cc.py", line 1523, in compile_cmodule
    preargs=preargs)
  File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cmodule.py", line 2343, in compile_str
    p_out = output_subprocess_Popen(cmd)
  File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\misc\windows.py", line 80, in output_subprocess_Popen
    out = p.communicate()
  File "C:\Miniconda3\envs\bayes\lib\subprocess.py", line 843, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
  File "C:\Miniconda3\envs\bayes\lib\subprocess.py", line 1092, in _communicate
    self.stdout_thread.join(self._remaining_time(endtime))
  File "C:\Miniconda3\envs\bayes\lib\threading.py", line 1056, in join
    self._wait_for_tstate_lock()
  File "C:\Miniconda3\envs\bayes\lib\threading.py", line 1072, in _wait_for_tstate_lock
    elif lock.acquire(block, timeout):
KeyboardInterrupt
[I 11:43:03.371 NotebookApp] Interrupted...
[I 11:43:03.371 NotebookApp] Shutting down 1 kernel
[I 11:43:08.431 NotebookApp] Kernel shutdown: cb25a99e-15f2-4f7f-b3c0-9706ab711a70

I hope this helps to shed light on the issue.

elfwired commented 6 years ago

I have similar error (windows 2012 + pymc3 3.5(master) + theano 1.0.3 (master) Here are ways, that can "work around" this situation for me^

Jeff-Winchell commented 5 years ago

I also have a short program that blows up (with a "broken pipe" message) as soon as I set chains > 1. I have a multicore machine (but then who doesn't). The code:

from theano import shared
from numpy import ones, array
from pymc3 import Model, Normal, Deterministic, Binomial, Metropolis, sample
from pymc3.math import invlogit

log_dosage = shared(array([-.86, -.3, -.05, .73]))
sample_size = shared(5 * ones(4, dtype=int))
deaths = array([0, 1, 3, 5])

with Model() as bioassay_model:
    alpha = Normal('alpha', 0, sd=100)
    beta = Normal('beta', 0, sd=100)
    theta = Deterministic("theta", invlogit(alpha + beta * log_dosage))
    observed_deaths = Binomial('observed_deaths', n=sample_size, p=theta, observed=deaths)
    trace = sample(draws=10000, start={"alpha":0.5}, step=Metropolis(), chains=2)

I have a GeForce GTX 1050 GPU running CUDA 8.0, CUDNN 7.1.3, theano 1.0.3, pymc3 3.5, python 3.6.6 my theano.rc:

[global] device = cuda force_device=True optimizer = fast_run optimizer_including=cudnn mode=FAST_RUN

[nvcc] fastmath = True allow_gc=True

[lib] cnmem = 0.8

[gpuarray] preallocate=0.7

[scan] allow_gc=True allow_output_prealloc=True

The error message:

BrokenPipeError                           Traceback (most recent call last)
<ipython-input-1-fb96fbe5f1ac> in <module>
     13     theta = Deterministic("theta", invlogit(alpha + beta * log_dosage))
     14     observed_deaths = Binomial('observed_deaths', n=sample_size, p=theta, observed=deaths)
---> 15     trace = sample(draws=10000, start={"alpha":0.5}, step=Metropolis(), chains=2)

~\AppData\Local\conda\conda\envs\pymc3\lib\site-packages\pymc3\sampling.py in sample(draws, step, init, n_init, start, trace, chain_idx, chains, cores, tune, nuts_kwargs, step_kwargs, progressbar, model, random_seed, live_plot, discard_tuned_samples, live_plot_kwargs, compute_convergence_checks, use_mmap, **kwargs)
    447             _print_step_hierarchy(step)
    448             try:
--> 449                 trace = _mp_sample(**sample_args)
    450             except pickle.PickleError:
    451                 _log.warning("Could not pickle model, sampling singlethreaded.")

~\AppData\Local\conda\conda\envs\pymc3\lib\site-packages\pymc3\sampling.py in _mp_sample(draws, tune, step, chains, cores, chain, random_seed, start, progressbar, trace, model, use_mmap, **kwargs)
    994         sampler = ps.ParallelSampler(
    995             draws, tune, chains, cores, random_seed, start, step,
--> 996             chain, progressbar)
    997         try:
    998             with sampler:

~\AppData\Local\conda\conda\envs\pymc3\lib\site-packages\pymc3\parallel_sampling.py in __init__(self, draws, tune, chains, cores, seeds, start_points, step_method, start_chain_num, progressbar)
    273             ProcessAdapter(draws, tune, step_method,
    274                            chain + start_chain_num, seed, start)
--> 275             for chain, seed, start in zip(range(chains), seeds, start_points)
    276         ]
    277 

~\AppData\Local\conda\conda\envs\pymc3\lib\site-packages\pymc3\parallel_sampling.py in <listcomp>(.0)
    273             ProcessAdapter(draws, tune, step_method,
    274                            chain + start_chain_num, seed, start)
--> 275             for chain, seed, start in zip(range(chains), seeds, start_points)
    276         ]
    277 

~\AppData\Local\conda\conda\envs\pymc3\lib\site-packages\pymc3\parallel_sampling.py in __init__(self, draws, tune, step_method, chain, seed, start)
    180             draws, tune, seed)
    181         # We fork right away, so that the main process can start tqdm threads
--> 182         self._process.start()
    183 
    184     @property

~\AppData\Local\conda\conda\envs\pymc3\lib\multiprocessing\process.py in start(self)
    103                'daemonic processes are not allowed to have children'
    104         _cleanup()
--> 105         self._popen = self._Popen(self)
    106         self._sentinel = self._popen.sentinel
    107         # Avoid a refcycle if the target function holds an indirect

~\AppData\Local\conda\conda\envs\pymc3\lib\multiprocessing\context.py in _Popen(process_obj)
    221     @staticmethod
    222     def _Popen(process_obj):
--> 223         return _default_context.get_context().Process._Popen(process_obj)
    224 
    225 class DefaultContext(BaseContext):

~\AppData\Local\conda\conda\envs\pymc3\lib\multiprocessing\context.py in _Popen(process_obj)
    320         def _Popen(process_obj):
    321             from .popen_spawn_win32 import Popen
--> 322             return Popen(process_obj)
    323 
    324     class SpawnContext(BaseContext):

~\AppData\Local\conda\conda\envs\pymc3\lib\multiprocessing\popen_spawn_win32.py in __init__(self, process_obj)
     63             try:
     64                 reduction.dump(prep_data, to_child)
---> 65                 reduction.dump(process_obj, to_child)
     66             finally:
     67                 set_spawning_popen(None)

~\AppData\Local\conda\conda\envs\pymc3\lib\multiprocessing\reduction.py in dump(obj, file, protocol)
     58 def dump(obj, file, protocol=None):
     59     '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60     ForkingPickler(file, protocol).dump(obj)
     61 
     62 #

BrokenPipeError: [Errno 32] Broken pipe
aseyboldt commented 5 years ago

I'm pretty sure that is not the same issue as the original (which is windows related). About the original bug: I've been trying to reproduce this on my own machine for some time, but so far I haven't managed to do that. This makes it rather hard to fix.

@Jeff-Winchell About your problem: I'd guess that this might be gpu related. Does it also happen if you use the cpu? Using a gpu for a problem like that doesn't make the slightest bit of sense by the way. As a general note: Before starting to write strange emails when you don't get a reply to a bug report, it could help to do a bit more work yourself first:

Jeff-Winchell commented 5 years ago

Have you run the very short code example I gave and replicated the bug? If you have, its not clear why most of your post was written. If you haven't run it, it's unclear why any of your post was written.

I was frankly taken aback by your post, but maybe you don't see why. I'm a software engineer, not a hacker. My teachers (and LinkedIn connections) include Ward Cunningham, Bertrand Meyer, Meilir Page-Jones, Gerry Weinberg, James Bach, Andy Hunt. I don't ship code to production code with known bugs in it. Ever.

If you can't replicate the bug I'd be happy to help come up with ideas why not. Otherwise, it's unproductive.

junpenglao commented 5 years ago

@Jeff-Winchell have you try running the suggestion by @aseyboldt? These are all valid suggestions, what would be productive is that you try to follow these suggestions first. Also, name-dropping is not a valid way to have a productive conversation.

We do not appreciate these hostile attitudes towards our developers/users, if you keep doing this (either privately or publicly) I will have to block and report you according to our community guidelines.

Jeff-Winchell commented 5 years ago

The first message to me was more hostile than my response was. Different people have different ideas about name dropping. So I guess you can ban me for saying my address is Jeff_Winchell@g.HARVARD.edu.

What else was hostile besides making it clear that I know a lot more than the first poster assumed I did when asking me to do a bunch of things that aren't useful?

Jeff-Winchell commented 5 years ago

FYI, none of those names I mentioned would even DREAM of banning someone for posting the message I did.

Jeff-Winchell commented 5 years ago

So go ahead and block me. The mere threat you made about doing so, so frivolously makes me want to challenge bullies publicly, just like they challenge me.

lucianopaz commented 5 years ago

Related discourse thread

Jeff-Winchell commented 5 years ago

I looked at that thread. If I move ONLY the pymc3.sample function into a if name=='main' block AND I make sure my GPU is globally turned off, then it won't crash. As I ran into the same problem with some other code that uses the NUTS sampler, I saw that the same workaround corrects that.

However, disabling the GPU globally is not a great solutions, so the GPU problem needs to be fixed, and I don't know how more complex code can be managed with the if name workaround. The real solution is to change the pycm3/theano/whatever code so that it executes under both LINUX and Windows instead of only worrying about Linux and ignoring the most widely used OS from the company with the largest market capitalization in the world.

lucianopaz commented 5 years ago

The main problem is that the broken pipe error is not helpful for debugging. We have seen that the broken pipe is raised by the main process. When it tries to spawn the worker pool that should do the sampling, the workers raise exceptions before they have spawned and were created, so they don't manage to communicate their failure to the main process, and once the main process tries to communicate with the pool, it finds the communication pipe broken. The main issue that we are focusing to fix first is to capture the exceptions raised during the spawning of the worker pool. These exceptions are the keys to debug the sources of the failures. Some of them were caused by the lack of the if name main block, and others were caused because of functions that were not pickleable. Once we sort that out, we will be able to help better with whatever is happening because of the GPU.

JackCaster commented 5 years ago

Following commit 98fd63e18179ffb28734c73c459ccdaf04121b92, I ran again the script that kept failing under Windows. The script under test is:

import pymc3 as pm
print(pm.__version__)

import theano.tensor as tt
import theano
print(theano.__version__)

import patsy
import pandas as pd
import numpy as np

SEED = 20180727

df = pd.read_csv(r'https://gist.githubusercontent.com/JackCaster/d74b36a66c172e80d1bdcee61d6975bf/raw/a2aab8690af7cebbe39ec5e5b425fe9a9b9a674d/data.csv', 
                 dtype={'Y':'category'})

_, X = patsy.dmatrices('Y ~ 1 + X1 + X2', data=df)

# Number of categories
n_cat = df.Y.cat.categories.size
# Number of predictors
n_pred = X.shape[1]

with pm.Model() as model:

    ## `p`--quantity that I want to model--needs to have size (n_obs, n_cat). 
    ## Because `X` has size (n_obs, n_pred), then `beta` needs to have size (n_pred, n_cat)

    # priors for categories 1-2, excluding reference category 0 which is set to zero below (see DBDA2 p. 651 for explanation).   
    beta_ = pm.Normal('beta_', mu=0, sd=50, shape=(n_pred, n_cat-1))
    # add prior values zero for reference category 0. (add a column)  
    beta = pm.Deterministic('beta', tt.concatenate([tt.zeros((n_pred, 1)), beta_], axis=1))

    # The softmax function will squash the values in the range 0-1
    p = tt.nnet.softmax(tt.dot(np.asarray(X), beta))

    likelihood = pm.Categorical('likelihood', p=p, observed=df.Y.cat.codes.values)

    trace = pm.sample(chains=2, cores=2)

    print('DONE')

Unfortunately, the sampling still fails with cores > 1 (pymc3 v. 3.6, theano v. 1.0.3). The jupyter kernel shuts down as soon as the sampling begins:

Multiprocess sampling (2 chains in 2 jobs)
NUTS: [beta_]
Sampling 2 chains:   0%|                                                                   | 0/2000 [00:00<?, ?draws/s]

The traceback, which points to a compilation error, was:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Miniconda3\envs\intro_to_pymc3\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\Miniconda3\envs\intro_to_pymc3\lib\multiprocessing\spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
  File "C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\compile\function_module.py", line 1082, in _constructor_Function
    f = maker.create(input_storage, trustme=True)
  File "C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\compile\function_module.py", line 1715, in create
    input_storage=input_storage_lists, storage_map=storage_map)
  File "C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\gof\link.py", line 699, in make_thunk
    storage_map=storage_map)[:3]
  File "C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\gof\vm.py", line 1091, in make_all
    impl=impl))
  File "C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\gof\op.py", line 955, in make_thunk
    no_recycling)
  File "C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\gof\op.py", line 858, in make_c_thunk
    output_storage=node_output_storage)
  File "C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\gof\cc.py", line 1217, in make_thunk
    keep_lock=keep_lock)
  File "C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\gof\cc.py", line 1157, in __compile__
    keep_lock=keep_lock)
  File "C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\gof\cc.py", line 1620, in cthunk_factory
    key=key, lnk=self, keep_lock=keep_lock)
  File "C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\gof\cmodule.py", line 1181, in module_from_key
    module = lnk.compile_cmodule(location)
  File "C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\gof\cc.py", line 1523, in compile_cmodule
    preargs=preargs)
  File "C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\gof\cmodule.py", line 2391, in compile_str
    (status, compile_stderr.replace('\n', '. ')))
Exception: ('The following error happened while compiling the node', InplaceDimShuffle{1,0}(Softmax.0), '\n', 'Compilation failed (return status=3): ', '[InplaceDimShuffle{1,0}(<TensorType(float64, matrix)>)]')
forrtl: error (200): program aborting due to control-C event
Image              PC                Routine            Line        Source
libifcoremd.dll    00007FFE414794C4  Unknown               Unknown  Unknown
KERNELBASE.dll     00007FFE79672763  Unknown               Unknown  Unknown
KERNEL32.DLL       00007FFE7ABD7E94  Unknown               Unknown  Unknown
ntdll.dll          00007FFE7D2CA251  Unknown               Unknown  Unknown
forrtl: error (200): program aborting due to control-C event
Image              PC                Routine            Line        Source
libifcoremd.dll    00007FFE414794C4  Unknown               Unknown  Unknown
KERNELBASE.dll     00007FFE79672763  Unknown               Unknown  Unknown
KERNEL32.DLL       00007FFE7ABD7E94  Unknown               Unknown  Unknown
ntdll.dll          00007FFE7D2CA251  Unknown               Unknown  Unknown
forrtl: error (200): program aborting due to control-C event
Image              PC                Routine            Line        Source
libifcoremd.dll    00007FFE414794C4  Unknown               Unknown  Unknown
KERNELBASE.dll     00007FFE79672763  Unknown               Unknown  Unknown
KERNEL32.DLL       00007FFE7ABD7E94  Unknown               Unknown  Unknown
ntdll.dll          00007FFE7D2CA251  Unknown               Unknown  Unknown
[I 18:43:13.302 NotebookApp] Interrupted...
[I 18:43:13.303 NotebookApp] Shutting down 1 kernel
[I 18:43:13.403 NotebookApp] Kernel shutdown: f6d274f4-ffbf-428a-a996-751cd821bd4a

The temporary, compiled C code reports in the last line

Problem occurred during compilation with the command line below:
"C:\Miniconda3\envs\intro_to_pymc3\Library\mingw-w64\bin\g++.exe" -shared -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -march=haswell -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16 -msahf -mmovbe -maes -mno-sha -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4 -mno-xop -mbmi -mbmi2 -mno-tbm -mavx -mavx2 -msse4.2 -msse4.1 -mlzcnt -mno-rtm -mno-hle -mrdrnd -mf16c -mfsgsbase -mno-rdseed -mno-prfchw -mno-adx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mno-clflushopt -mno-xsavec -mno-xsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma -mno-avx512vbmi -mno-clwb -mno-pcommit -mno-mwaitx --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=4096 -mtune=haswell -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -DMS_WIN64 -I"C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\numpy\core\include" -I"C:\Miniconda3\envs\intro_to_pymc3\include" -I"C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\gof\c_code" -L"C:\Miniconda3\envs\intro_to_pymc3\libs" -L"C:\Miniconda3\envs\intro_to_pymc3" -o "C:\Users\moran\AppData\Local\Theano\compiledir_Windows-10-10.0.17763-SP0-Intel64_Family_6_Model_69_Stepping_1_GenuineIntel-3.6.6-64\tmpujapb2d5\m885ff006a95d626dac547a7bdfdb471bbf058622ece2b4435e42316c4012ea56.pyd" "C:\Users\moran\AppData\Local\Theano\compiledir_Windows-10-10.0.17763-SP0-Intel64_Family_6_Model_69_Stepping_1_GenuineIntel-3.6.6-64\tmpujapb2d5\mod.cpp" -lpython36

Does this shed more light on this matter?

EDIT: I also confirmed (as suggested by @elfwired) that setting theano.config.mode = 'FAST_COMPILE' allows to run the sampler successfully---but the sampling becomes very slow. I tried to fiddle with theano.config.mode, theano.config.optimizer, and theano.config.linker without much success.

twiecki commented 5 years ago

This looks like a Theano problem, can you open an issue there? It looks very archaic to me.

JackCaster commented 5 years ago

This looks like a Theano problem, can you open an issue there? It looks very archaic to me.

Done, let's see 🤞

EDIT: Just a note. When there is a compilation error, the traceback points to the temporary C code. At the end of that code, there is a line saying:

Problem occurred during compilation with the command line below:
"C:\Miniconda3\envs\intro_to_pymc3\Library\mingw-w64\bin\g++.exe" -shared -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -march=haswell -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16 -msahf -mmovbe -maes -mno-sha -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4 -mno-xop -mbmi -mbmi2 -mno-tbm -mavx -mavx2 -msse4.2 -msse4.1 -mlzcnt -mno-rtm -mno-hle -mrdrnd -mf16c -mfsgsbase -mno-rdseed -mno-prfchw -mno-adx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mno-clflushopt -mno-xsavec -mno-xsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma -mno-avx512vbmi -mno-clwb -mno-pcommit -mno-mwaitx --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=4096 -mtune=haswell -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -DMS_WIN64 -I"C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\numpy\core\include" -I"C:\Miniconda3\envs\intro_to_pymc3\include" -I"C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\gof\c_code" -L"C:\Miniconda3\envs\intro_to_pymc3\libs" -L"C:\Miniconda3\envs\intro_to_pymc3" -o "C:\Users\moran\AppData\Local\Theano\compiledir_Windows-10-10.0.17763-SP0-Intel64_Family_6_Model_69_Stepping_1_GenuineIntel-3.6.6-64\tmpujapb2d5\m885ff006a95d626dac547a7bdfdb471bbf058622ece2b4435e42316c4012ea56.pyd" "C:\Users\moran\AppData\Local\Theano\compiledir_Windows-10-10.0.17763-SP0-Intel64_Family_6_Model_69_Stepping_1_GenuineIntel-3.6.6-64\tmpujapb2d5\mod.cpp" -lpython36

I tried to run the command post-mortem, but the temp folder ...\tmpujapb2d5\... that does not exist (but a bunch of others do). I am wondering if there is a problem on how the multiprocessing pool is instantiated.

sunn-e commented 4 years ago

I got similar error for this snippet in MCMC application

with pm.Model() as sleep_model:

    # Create the alpha and beta parameters
    alpha = pm.Normal('alpha', mu=0.0, tau=0.01, testval=0.0)
    beta = pm.Normal('beta', mu=0.0, tau=0.01, testval=0.0)

    # Create the probability from the logistic function
    p = pm.Deterministic('p', 1. / (1. + tt.exp(beta * time + alpha)))

    # Create the bernoulli parameter which uses the observed dat
    observed = pm.Bernoulli('obs', p, observed=sleep_obs)

    # Starting values are found through Maximum A Posterior estimation
    # start = pm.find_MAP()

    # Using Metropolis Hastings Sampling
    step = pm.Metropolis()

    # Sample from the posterior using the sampling method
    #sleep_trace = pm.sample(N_SAMPLES, step=step, njobs=2);
    sleep_trace = pm.sample(N_SAMPLES, step=step);

Error message:


Multiprocess sampling (4 chains in 4 jobs)
CompoundStep
>Metropolis: [beta]
>Metropolis: [alpha]

BrokenPipeError                           Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\pymc3\parallel_sampling.py in __init__(self, draws, tune, step_method, chain, seed, start)
    241         try:
--> 242             self._process.start()
    243         except IOError as e:

C:\ProgramData\Anaconda3\lib\multiprocessing\process.py in start(self)
    111         _cleanup()
--> 112         self._popen = self._Popen(self)
    113         self._sentinel = self._popen.sentinel

C:\ProgramData\Anaconda3\lib\multiprocessing\context.py in _Popen(process_obj)
    222     def _Popen(process_obj):
--> 223         return _default_context.get_context().Process._Popen(process_obj)
    224 

C:\ProgramData\Anaconda3\lib\multiprocessing\context.py in _Popen(process_obj)
    321             from .popen_spawn_win32 import Popen
--> 322             return Popen(process_obj)
    323 

C:\ProgramData\Anaconda3\lib\multiprocessing\popen_spawn_win32.py in __init__(self, process_obj)
     88                 reduction.dump(prep_data, to_child)
---> 89                 reduction.dump(process_obj, to_child)
     90             finally:

C:\ProgramData\Anaconda3\lib\multiprocessing\reduction.py in dump(obj, file, protocol)
     59     '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60     ForkingPickler(file, protocol).dump(obj)
     61 

BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
<ipython-input-26-4ad3b5446758> in <module>
     18     # Sample from the posterior using the sampling method
     19     #sleep_trace = pm.sample(N_SAMPLES, step=step, njobs=2);
---> 20     sleep_trace = pm.sample(N_SAMPLES, step=step);

C:\ProgramData\Anaconda3\lib\site-packages\pymc3\sampling.py in sample(draws, step, init, n_init, start, trace, chain_idx, chains, cores, tune, progressbar, model, random_seed, discard_tuned_samples, compute_convergence_checks, **kwargs)
    435             _print_step_hierarchy(step)
    436             try:
--> 437                 trace = _mp_sample(**sample_args)
    438             except pickle.PickleError:
    439                 _log.warning("Could not pickle model, sampling singlethreaded.")

C:\ProgramData\Anaconda3\lib\site-packages\pymc3\sampling.py in _mp_sample(draws, tune, step, chains, cores, chain, random_seed, start, progressbar, trace, model, **kwargs)
    963     sampler = ps.ParallelSampler(
    964         draws, tune, chains, cores, random_seed, start, step,
--> 965         chain, progressbar)
    966     try:
    967         try:

C:\ProgramData\Anaconda3\lib\site-packages\pymc3\parallel_sampling.py in __init__(self, draws, tune, chains, cores, seeds, start_points, step_method, start_chain_num, progressbar)
    359                 draws, tune, step_method, chain + start_chain_num, seed, start
    360             )
--> 361             for chain, seed, start in zip(range(chains), seeds, start_points)
    362         ]
    363 

C:\ProgramData\Anaconda3\lib\site-packages\pymc3\parallel_sampling.py in <listcomp>(.0)
    359                 draws, tune, step_method, chain + start_chain_num, seed, start
    360             )
--> 361             for chain, seed, start in zip(range(chains), seeds, start_points)
    362         ]
    363 

C:\ProgramData\Anaconda3\lib\site-packages\pymc3\parallel_sampling.py in __init__(self, draws, tune, step_method, chain, seed, start)
    249                     # all its error message
    250                     time.sleep(0.2)
--> 251                     raise exc
    252             raise
    253 

RuntimeError: The communication pipe between the main process and its spawned children is broken.
In Windows OS, this usually means that the child process raised an exception while it was being spawned, before it was setup to communicate to the main process.
The exceptions raised by the child process while spawning cannot be caught or handled from the main process, and when running from an IPython or jupyter notebook interactive kernel, the child's exception and traceback appears to be lost.
A known way to see the child's error, and try to fix or handle it, is to run the problematic code as a batch script from a system's Command Prompt. The child's exception will be printed to the Command Promt's stderr, and it should be visible above this error and traceback.
Note that if running a jupyter notebook that was invoked from a Command Prompt, the child's exception should have been printed to the Command Prompt on which the notebook is running.

Running on Windows 10 with latest packages of everything.

russellu commented 4 years ago

same thing for me (windows 10, spyder, installed through anaconda) setting cores=1 in pm.sample() runs fine

Multiprocess sampling (4 chains in 4 jobs) BinaryGibbsMetropolis: [rain, sprinkler] Traceback (most recent call last):

File "", line 18, in trace = pm.sample(20000, step=[pm.BinaryGibbsMetropolis([rain, sprinkler])], tune=tune, random_seed=124)

File "C:\Users\butle\Anaconda3\lib\site-packages\pymc3\sampling.py", line 437, in sample trace = _mp_sample(**sample_args)

File "C:\Users\butle\Anaconda3\lib\site-packages\pymc3\sampling.py", line 965, in _mp_sample chain, progressbar)

File "C:\Users\butle\Anaconda3\lib\site-packages\pymc3\parallel_sampling.py", line 361, in init for chain, seed, start in zip(range(chains), seeds, start_points)

File "C:\Users\butle\Anaconda3\lib\site-packages\pymc3\parallel_sampling.py", line 361, in for chain, seed, start in zip(range(chains), seeds, start_points)

File "C:\Users\butle\Anaconda3\lib\site-packages\pymc3\parallel_sampling.py", line 251, in init raise exc

RuntimeError: The communication pipe between the main process and its spawned children is broken. In Windows OS, this usually means that the child process raised an exception while it was being spawned, before it was setup to communicate to the main process. The exceptions raised by the child process while spawning cannot be caught or handled from the main process, and when running from an IPython or jupyter notebook interactive kernel, the child's exception and traceback appears to be lost. A known way to see the child's error, and try to fix or handle it, is to run the problematic code as a batch script from a system's Command Prompt. The child's exception will be printed to the Command Promt's stderr, and it should be visible above this error and traceback. Note that if running a jupyter notebook that was invoked from a Command Prompt, the child's exception should have been printed to the Command Prompt on which the notebook is running.

ElefHead commented 4 years ago

Same for me. Windows 10. cores=1 works fine. Theano with cuda. I used vscode to write the code, but ran it via cmd.

I am just getting into pymc and was following along the code on Osvaldo Martin's book. This was the code I tried.

import numpy as np 
from scipy import stats
import pymc3 as pm 

np.random.seed(123)

if __name__ == "__main__":
    trials = 4 

    theta_real = 0.35 
    data = stats.bernoulli.rvs(p=theta_real, size=trials)

    with pm.Model() as our_first_model:
        theta = pm.Beta("theta", alpha=1., beta=1.)
        y = pm.Bernoulli("y", p=theta, observed=data)
        trace = pm.sample(1000, random_seed=123)

The following is the trace


Traceback (most recent call last):
  File "test.py", line 16, in <module>
    trace = pm.sample(1000, random_seed=123)
  File "C:\Anaconda\lib\site-packages\pymc3\sampling.py", line 437, in sample
    trace = _mp_sample(**sample_args)
  File "C:\Anaconda\lib\site-packages\pymc3\sampling.py", line 965, in _mp_sample
    chain, progressbar)
  File "C:\Anaconda\lib\site-packages\pymc3\parallel_sampling.py", line 361, in __init__
    for chain, seed, start in zip(range(chains), seeds, start_points)
  File "C:\Anaconda\lib\site-packages\pymc3\parallel_sampling.py", line 361, in <listcomp>
    for chain, seed, start in zip(range(chains), seeds, start_points)
  File "C:\Anaconda\lib\site-packages\pymc3\parallel_sampling.py", line 251, in __init__
    raise exc
RuntimeError: The communication pipe between the main process and its spawned children is broken.
In Windows OS, this usually means that the child process raised an exception while it was being spawned, before it was setup to communicate to the main process.
The exceptions raised by the child process while spawning cannot be caught or handled from the main process, and when running from an IPython or jupyter notebook interactive kernel, the child's exception and traceback appears to be lost.
A known way to see the child's error, and try to fix or handle it, is to run the problematic code as a batch script from a system's Command Prompt. The child's exception will be printed to the Command Promt's stderr, and it should be visible above this error and traceback.
Note that if running a jupyter notebook that was invoked from a Command Prompt, the child's exception should have been printed to the Command Prompt on which the notebook is running.

cadama commented 4 years ago

I am facing the same issue on a Debian machine. In particular the default ones on Google Dataproc https://cloud.google.com/compute/docs/images#debian 1.5-debian.

Setting one of:

os.environ['MKL_THREADING_LAYER'] = 'sequential'
os.environ['OMP_NUM_THREADS'] = '1'

allowed me to make the thing run but I suspect this is preventing me to scale things up. Indeed I noticed single chains appear to use just one cpu each. Is this a known issue for certain linux distributions? Is there a linux distro where multiprocessing is know to work well?

charlesbaynham commented 3 years ago

Hi all,

Just a note to say that as a new user running simple example code, I'm also seeing this problem in Spyder 4.2.0 on Windows using a fresh install of pymc3==3.8. Adding a if __name__ == "__main__": guard sorts it out (but of course removes a lot of Spyder's functionality).

twiecki commented 3 years ago

Can you try with pymc3 3.10?

On Thu, Jan 21, 2021 at 12:00 PM Charles Baynham notifications@github.com wrote:

Hi all,

Just a note to say that as a new user running simple example code, I'm also seeing this problem in Spyder 4.2.0 on Windows using a fresh install of pymc3==3.8. Adding a if name == "main": guard sorts it out (but of course removes a lot of Spyder's functionality).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pymc-devs/pymc3/issues/3140#issuecomment-764555003, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFETGGWF26H3SOHUR6PPODS3ACMBANCNFSM4FOHA3HQ .

charlesbaynham commented 3 years ago

Also in 3.10 I'm afraid: I'm seeing the same duplicated messages followed by the

RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

error. My code is

observations = np.array([20, 6, 6, 6, 6, 6])
with pm.Model():
    probs = pm.Dirichlet('probs', a=np.ones(6))  # flat prior
    rolls = pm.Multinomial('rolls', n=50, p=probs, observed=observations)
    trace = pm.sample(1000)
charlesbaynham commented 3 years ago

And same in 3.11 too

ecm200 commented 2 years ago

I found the following work around for the issue described above. When specifying n_cores>1 I was not able to get sampling to work, and I have been forced to run chains sequentially. However, it would there is still some parallelism through threading, as the sampler clearly uses more than 1 core.

The work around involves specifying OpenMP environment variable so that the number of threads of the sampling process is reduced to 1 by setting the following at the beginning of the python script (after imports):

os.environ['OMP_NUM_THREADS'] = '1'

When specifying n_cores>1, this has allowed me to run the sampling of up to 4 chains in parallel, as single threaded processes. If I increase the OMP_NUM_THREADS>1 , the sampling process will again hang.

I 'd be interested to know if anyone has successfully managed to execute multiple chain processes, with multiple threads per chain process.

twiecki commented 2 years ago

Thanks for reporting back, @charlesbaynham can you test if this works for you too?

ecm200 commented 2 years ago

I found the following work around for the issue described above. When specifying n_cores>1 I was not able to get sampling to work, and I have been forced to run chains sequentially. However, it would there is still some parallelism through threading, as the sampler clearly uses more than 1 core.

The work around involves specifying OpenMP environment variable so that the number of threads of the sampling process is reduced to 1 by setting the following at the beginning of the python script (after imports):

os.environ['OMP_NUM_THREADS'] = '1'

When specifying n_cores>1, this has allowed me to run the sampling of up to 4 chains in parallel, as single threaded processes. If I increase the OMP_NUM_THREADS>1 , the sampling process will again hang.

I 'd be interested to know if anyone has successfully managed to execute multiple chain processes, with multiple threads per chain process.

@twiecki I should have said that I have managed to get this behaviour with pymc3= v3.11.2 & v3.11.4 installed into brand new python environments, with the following package versions:

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_openmp_mutex             4.5                       1_gnu  
arviz                     0.11.4             pyhd8ed1ab_0    conda-forge
asttokens                 2.0.5              pyhd8ed1ab_0    conda-forge
automateml                0.0.0                     dev_0    <develop>
azure-core                1.22.0                   pypi_0    pypi
azure-cosmos              4.2.0                    pypi_0    pypi
azure-storage-blob        12.9.0                   pypi_0    pypi
backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
backports                 1.0                        py_2    conda-forge
backports.functools_lru_cache 1.6.4              pyhd8ed1ab_0    conda-forge
black                     22.1.0             pyhd8ed1ab_0    conda-forge
brotli                    1.0.9                h7f98852_5    conda-forge
brotli-bin                1.0.9                h7f98852_5    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.18.1               h7f8727e_0  
ca-certificates           2021.10.8            ha878542_0    conda-forge
cachetools                5.0.0              pyhd8ed1ab_0    conda-forge
certifi                   2021.10.8        py39hf3d152e_1    conda-forge
cffi                      1.15.0                   pypi_0    pypi
cftime                    1.5.1.1          py39hce1f21e_0  
charset-normalizer        2.0.11                   pypi_0    pypi
click                     8.0.3            py39hf3d152e_1    conda-forge
cryptography              36.0.1                   pypi_0    pypi
curl                      7.80.0               h7f8727e_0  
cycler                    0.11.0             pyhd8ed1ab_0    conda-forge
dataclasses               0.8                pyhc8e2a94_3    conda-forge
dbus                      1.13.6               he372182_0    conda-forge
debugpy                   1.5.1            py39h295c915_0  
decorator                 5.1.1              pyhd8ed1ab_0    conda-forge
dill                      0.3.4              pyhd8ed1ab_0    conda-forge
entrypoints               0.4                pyhd8ed1ab_0    conda-forge
executing                 0.8.2              pyhd8ed1ab_0    conda-forge
expat                     2.2.10               h9c3ff4c_0    conda-forge
fastprogress              1.0.0                      py_0    conda-forge
filelock                  3.4.2              pyhd8ed1ab_1    conda-forge
fontconfig                2.13.1               h6c09931_0  
fonttools                 4.25.0             pyhd3eb1b0_0  
freetype                  2.10.4               h0708190_1    conda-forge
glib                      2.69.1               h4ff587b_1  
gst-plugins-base          1.14.0               hbbd80ab_1  
gstreamer                 1.14.0               h28cd5cc_2  
hdf4                      4.2.13               h3ca952b_2  
hdf5                      1.10.6          nompi_h6a2412b_1114    conda-forge
icu                       58.2              hf484d3e_1000    conda-forge
idna                      3.3                      pypi_0    pypi
importlib-metadata        4.10.1           py39hf3d152e_0    conda-forge
importlib_metadata        4.10.1               hd8ed1ab_0    conda-forge
intel-openmp              2021.4.0          h06a4308_3561  
ipykernel                 6.9.0            py39hef51801_0    conda-forge
ipython                   8.0.1            py39hf3d152e_0    conda-forge
isodate                   0.6.1                    pypi_0    pypi
jedi                      0.18.1           py39hf3d152e_0    conda-forge
joblib                    1.1.0              pyhd8ed1ab_0    conda-forge
jpeg                      9d                   h7f8727e_0  
jupyter_client            7.1.2              pyhd8ed1ab_0    conda-forge
jupyter_core              4.9.1            py39hf3d152e_1    conda-forge
kiwisolver                1.3.1            py39h2531618_0  
krb5                      1.19.2               hcc1bbae_0    conda-forge
lcms2                     2.12                 hddcbb42_0    conda-forge
ld_impl_linux-64          2.35.1               h7274673_9  
libblas                   3.9.0           11_linux64_openblas    conda-forge
libbrotlicommon           1.0.9                h7f98852_5    conda-forge
libbrotlidec              1.0.9                h7f98852_5    conda-forge
libbrotlienc              1.0.9                h7f98852_5    conda-forge
libcblas                  3.9.0           11_linux64_openblas    conda-forge
libcurl                   7.80.0               h0b77cf5_0  
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libffi                    3.3                  he6710b0_2  
libgcc-ng                 9.3.0               h5101ec6_17  
libgfortran-ng            11.2.0              h69a702a_12    conda-forge
libgfortran5              11.2.0              h5c6108e_12    conda-forge
libgomp                   9.3.0               h5101ec6_17  
liblapack                 3.9.0           11_linux64_openblas    conda-forge
libnetcdf                 4.8.1                h42ceab0_1  
libnghttp2                1.46.0               hce63b2e_0  
libopenblas               0.3.17          pthreads_h8fe5266_1    conda-forge
libpng                    1.6.37               h21135ba_2    conda-forge
libsodium                 1.0.18               h36c2ea0_1    conda-forge
libssh2                   1.9.0                h1ba5d50_1  
libstdcxx-ng              9.3.0               hd4cf53a_17  
libtiff                   4.2.0                h85742a9_0  
libuuid                   1.0.3                h7f8727e_2  
libwebp-base              1.2.0                h27cfd23_0  
libxcb                    1.13              h7f98852_1003    conda-forge
libxml2                   2.9.12               h03d6c58_0  
libzip                    1.8.0                h4de3113_0    conda-forge
lz4-c                     1.9.3                h9c3ff4c_1    conda-forge
matplotlib                3.4.3            py39hf3d152e_2    conda-forge
matplotlib-base           3.4.3            py39hbbc1b5f_0  
matplotlib-inline         0.1.3              pyhd8ed1ab_0    conda-forge
mkl                       2021.4.0           h06a4308_640  
mkl-service               2.4.0            py39h3811e60_0    conda-forge
msrest                    0.6.21                   pypi_0    pypi
munkres                   1.1.4              pyh9f0ad1d_0    conda-forge
mypy_extensions           0.4.3            py39hf3d152e_4    conda-forge
ncurses                   6.3                  h7f8727e_2  
nest-asyncio              1.5.4              pyhd8ed1ab_0    conda-forge
netcdf4                   1.5.7            py39ha0f2276_1  
numpy                     1.20.3           py39hdbf815f_1    conda-forge
oauthlib                  3.2.0                    pypi_0    pypi
olefile                   0.46               pyh9f0ad1d_1    conda-forge
openssl                   1.1.1m               h7f8727e_0  
packaging                 21.3               pyhd8ed1ab_0    conda-forge
pandas                    1.2.3            py39hde0f152_0    conda-forge
parso                     0.8.3              pyhd8ed1ab_0    conda-forge
pathspec                  0.9.0              pyhd8ed1ab_0    conda-forge
patsy                     0.5.2              pyhd8ed1ab_0    conda-forge
pcre                      8.45                 h9c3ff4c_0    conda-forge
pexpect                   4.8.0              pyh9f0ad1d_2    conda-forge
pickleshare               0.7.5                   py_1003    conda-forge
pillow                    7.2.0            py39h6f3857e_2    conda-forge
pip                       21.2.4           py39h06a4308_0  
platformdirs              2.4.1              pyhd8ed1ab_1    conda-forge
prompt-toolkit            3.0.26             pyha770c72_0    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
pure_eval                 0.2.2              pyhd8ed1ab_0    conda-forge
pycparser                 2.21                     pypi_0    pypi
pydantic                  1.9.0                    pypi_0    pypi
pygments                  2.11.2             pyhd8ed1ab_0    conda-forge
pymc3                     3.11.4           py39hb070fc8_0  
pyparsing                 3.0.7              pyhd8ed1ab_0    conda-forge
pyqt                      5.9.2            py39h2531618_6  
python                    3.9.7                h12debd9_1  
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python_abi                3.9                      2_cp39    conda-forge
pytz                      2021.3             pyhd8ed1ab_0    conda-forge
pyzmq                     19.0.2           py39hb69f2a1_2    conda-forge
qt                        5.9.7                h5867ecd_1  
readline                  8.1.2                h7f8727e_1  
requests                  2.27.1                   pypi_0    pypi
requests-oauthlib         1.3.1                    pypi_0    pypi
scikit-learn              1.0.2                    pypi_0    pypi
scipy                     1.5.3            py39hee8e79c_0    conda-forge
seaborn                   0.11.2               hd8ed1ab_0    conda-forge
seaborn-base              0.11.2             pyhd8ed1ab_0    conda-forge
semver                    2.13.0             pyh9f0ad1d_0    conda-forge
setuptools                58.0.4           py39h06a4308_0  
sip                       4.19.13          py39h295c915_0  
six                       1.16.0             pyh6c4a22f_0    conda-forge
smartreturntools          0.1.4                     dev_0    <develop>
sqlite                    3.37.2               hc218d9a_0  
stack_data                0.1.4              pyhd8ed1ab_0    conda-forge
statsmodels               0.13.0                   pypi_0    pypi
theano-pymc               1.1.2            py39h51133e4_0  
threadpoolctl             3.1.0              pyh8a188c0_0    conda-forge
tk                        8.6.11               h1ccaba5_0  
tomli                     2.0.1              pyhd8ed1ab_0    conda-forge
tornado                   6.1              py39h3811e60_1    conda-forge
traitlets                 5.1.1              pyhd8ed1ab_0    conda-forge
typed-ast                 1.4.3            py39h3811e60_0    conda-forge
typing-extensions         3.10.0.2             hd8ed1ab_0    conda-forge
typing_extensions         3.10.0.2           pyha770c72_0    conda-forge
tzdata                    2021e                hda174b7_0  
urllib3                   1.26.8                   pypi_0    pypi
wcwidth                   0.2.5              pyh9f0ad1d_2    conda-forge
wheel                     0.37.1             pyhd3eb1b0_0  
xarray                    0.21.1             pyhd8ed1ab_0    conda-forge
xorg-libxau               1.0.9                h7f98852_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xz                        5.2.5                h7b6447c_0  
zeromq                    4.3.4                h9c3ff4c_0    conda-forge
zipp                      3.7.0              pyhd8ed1ab_1    conda-forge
zlib                      1.2.11               h7f8727e_4  
zstd                      1.4.9                ha95c52a_0    conda-forge

These examples were executed in a python script (not in a notebook), in a TMUX session. I am fairly new to pymc3, so the model below is not very specific and doesn't yet have specified priors. The objective is Bayesian multiple regression, with a model defined as y ~ a + w0.x0 + w1.x1 + ... + w_n.x_n, where w_n is a set of n coefficients which form the multiple regression problem of predicting the continuous scalar quantity y.

The script was encapsulated as follows:


import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from matplotlib import cm
from pathlib import Path
import os

from sklearn.model_selection import train_test_split
import patsy
import pymc3 as pm
import pickle

#os.environ['MKL_THREADING_LAYER'] = 'sequential'
os.environ['OMP_NUM_THREADS'] = '1'

if __name__ == '__main__':

   # Bayesian Multiple Regression
    sample_params = {
        'draws' : 10000, 
        'chains' : 4, 
        'tune' : 4000, 
        'target_accept' : 0.87, 
        'random_seed' : 123456,
        'init' : 'auto', 
        'cores' : 4
    }

    # Model inference
    run_posterior_sampling = True
    n_inference_samples = 5000

   ## DATA_CONDITIONING
   X_full_norm_dwsmp = get_data_from_pkl(pkl_file) # This is a simplication for my data processing steps, but the data arrives as a numpy array of regressors (cols) and samples (rows).

   ## BAYESIAN MODELLING

    # Construct the wavelength column names and the PATSY string model formula
    model_formula_str="y ~ "
    columns = []
    for i in np.arange(0,X_full_norm_dwsmp.shape[1],1):
        columns.append("w"+str(i))
        if i == X_full_norm_dwsmp.shape[1] - 1:
            model_formula_str = model_formula_str + "w{}".format(i)
        else:
            model_formula_str = model_formula_str + "w{} + ".format(i)

    # Make the data dataframe with the wavelengths and target value.
    X_df = pd.DataFrame(X_full_norm_dwsmp, columns=[columns])
    Y_df = pd.DataFrame(y, columns=['y'])
    data_df = Y_df.join(X_df, how='left')
    data_df.columns = ['y'] + columns

    print('Model formula:: {}'.format(model_formula_str))

    print('[INFO] - PATSY model configuration')
    # Define model formula.
    formula = model_formula_str
    # Create features.
    y, x = patsy.dmatrices(formula_like=formula, data=data_df)
    y = np.asarray(y).flatten()
    labels = x.design_info.column_names
    x = np.asarray(x)

    print('[INFO] - Train Test Splitting Data')
    x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=0.9, random_state=123456)

    print('[INFO] - Starting the modelling...', end='')
    with pm.Model() as model:
        # Set data container.
        print('creating data container...', end='')
        data = pm.Data("data", x_train)
        # Define GLM family.
        family = pm.glm.families.Normal()
        # Set priors.
        #priors = {
        #    "Intercept": pm.Normal.dist(mu=0, sd=10),
        #    "x1": pm.Normal.dist(mu=0, sd=10),
        #    "x2": pm.Normal.dist(mu=0, sd=10),
        #    "x1:x2": pm.Normal.dist(mu=0, sd=10),
        #}
        # Specify model.
        print('Building the model...', end='')
        pm.glm.GLM(y=y_train, x=data, family=family, intercept=False, labels=labels)# , priors=priors)
        print('Complete.')

        print('[INFO] - Sampling the model...')
        # Configure sampler.
        trace = pm.sample(**sample_params)

        trained_model = {
            'data' : data,
            'model' : model,
            'trace' : trace
        }

        if run_posterior_sampling:
            print('[INFO] - Running inference')
            # Update data reference.
            pm.set_data({"data": x_test}, model=model)
            # Generate posterior samples.
            trained_model['ppc_test'] = pm.sample_posterior_predictive(trace, model=model, samples=n_inference_samples)
            # Update to pickle file reflect ppc results included
            ppc_pkl_file = os.path.join(str(Path(output_results_file).absolute().parent), Path(output_results_file).parts[1].split('.')[0]+'_ppc.pkl')

        with open(output_results_file, 'wb') as outfile:
            pickle.dump(trained_model, outfile)
ecm200 commented 2 years ago

@twiecki , I also have another question.

What's the rational behind limiting the number of cores (n_cores) to a maximum of 4?

I am not very well acquainted with the code, but if the sampling of the chains is a separate MCMC process that is independent of each other, then running more than 4 chains in parallel would be still be reasonable?

twiecki commented 2 years ago

@ecm200 You can certainly run more but usually 4x1000 posterior samples is enough for most convergence diagnostics.

ecm200 commented 2 years ago

@twiecki thanks very much for feedback, very much appreciated.

Any idea how the sampler scales with threads?

My machine is set by default (or whether that is variable set in the Theano backend I don't know) to 4 threads per sampling process. So, to take advantage of the compute power we have, I was considering that perhaps an increased threaded sequential chain sampling setup would be more efficient?

Ideally of course, it would be nice if the multiprocessing sampler played nicely with multi-threaded processes in parallel, with the obvious caveat of making sure that one doesn't oversubscribe the discrete compute cores of the system.

covertg commented 2 years ago

I would like to further motivate the above question. Suppose I would like to apply MCMC methods to an existing type of model. (I.e. I use some black-box likelihood function as a theano/aesara op.) Suppose further that the existing code for this model can also be parallelized in certain cases. Then — is there any best-practice for allocating the parallel compute resources between sampling, openmp computations, and this black-box likelihood?

I understand that the likelihood complication is somewhat outside the scope of the pymc-dev's influence, and some empirical testing might be the way to go. Perhaps it's also a pretty rare case. I just mean to say though: in general, it could be very nice to have in the documentation some deeper explanation/exploration of how parallelization occurs in pymc, and best practices for tuning it.

twiecki commented 2 years ago

The general logic is that you only need to parallelize at the highest level if you can max out at that level.

But I agree that some more info would be helpful, can you help drafting something?

Iqigai commented 2 years ago

I have a similar issue with, I have tried os.environ['OMP_NUM_THREADS'] = '1' with no luck. When I try to run this line: idata = model.fit(random_seed=SEED) from this notebook I get this error message:


    ]()[292](file:///c%3A/Program%20Files/Python39/lib/site-packages/theano/link/c/cmodule.py?line=291)[ with warnings.catch_warnings():
    ]()[293](file:///c%3A/Program%20Files/Python39/lib/site-packages/theano/link/c/cmodule.py?line=292)[     warnings.filterwarnings("ignore", message="numpy.ndarray size changed")
--> ]()[294](file:///c%3A/Program%20Files/Python39/lib/site-packages/theano/link/c/cmodule.py?line=293)[     rval = __import__(module_name, {}, {}, [module_name])
    ]()[295](file:///c%3A/Program%20Files/Python39/lib/site-packages/theano/link/c/cmodule.py?line=294)[ t1 = time.time()
    ]()[296](file:///c%3A/Program%20Files/Python39/lib/site-packages/theano/link/c/cmodule.py?line=295) import_time += t1 - t0
twiecki commented 2 years ago

Can you try upgrading to pymc 4.0.0b3?

Iqigai commented 2 years ago

My initial script was an attempt to use bambi which is built on top of PyMC3. So I tried another example, and not even the import statement import pymc4 as pm works on my system:


WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
Traceback (most recent call last):
  File "C:\Program Files\Python39\lib\site-packages\IPython\core\interactiveshell.py", line 3251, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-e46b348a9125>", line 1, in <module>
    import pymc as pm
  File "C:\Program Files\JetBrains\PyCharm 2021.1.2\plugins\python\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "C:\Program Files\Python39\lib\site-packages\pymc\__init__.py", line 39, in <module>
    __set_compiler_flags()
  File "C:\Program Files\Python39\lib\site-packages\pymc\__init__.py", line 33, in __set_compiler_flags
    import aesara
  File "C:\Program Files\JetBrains\PyCharm 2021.1.2\plugins\python\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "C:\Program Files\Python39\lib\site-packages\aesara\__init__.py", line 79, in <module>
    from aesara import scalar, tensor
  File "C:\Program Files\JetBrains\PyCharm 2021.1.2\plugins\python\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "C:\Program Files\Python39\lib\site-packages\aesara\tensor\__init__.py", line 96, in <module>
    from aesara.tensor import (  # noqa
  File "C:\Program Files\JetBrains\PyCharm 2021.1.2\plugins\python\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "C:\Program Files\Python39\lib\site-packages\aesara\tensor\nnet\__init__.py", line 3, in <module>
    import aesara.tensor.nnet.opt
  File "C:\Program Files\JetBrains\PyCharm 2021.1.2\plugins\python\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "C:\Program Files\Python39\lib\site-packages\aesara\tensor\nnet\opt.py", line 17, in <module>
    from aesara.tensor.nnet.abstract_conv import (
  File "C:\Program Files\JetBrains\PyCharm 2021.1.2\plugins\python\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "C:\Program Files\Python39\lib\site-packages\aesara\tensor\nnet\abstract_conv.py", line 18, in <module>
    from scipy.signal.signaltools import _bvalfromboundary, _valfrommode, convolve
ImportError: cannot import name '_bvalfromboundary' from 'scipy.signal.signaltools' (C:\Program Files\Python39\lib\site-packages\scipy\signal\signaltools.py)```
twiecki commented 2 years ago

@JackCaster You can try again with pymc 4.0.0b3.

ivanmkc commented 2 years ago

I also reproduced the same issue. Works in a Jupyter notebook but not in a script. This is on OSX.

michaelosthege commented 2 years ago

Looks like this thread has accumulated a bunch of different issues that are unrelated to the original one from four years ago. Closing... More specific issues can be opened if needed.