Open Garve opened 4 years ago
This might be a problem of multiprocessing in windows. I can't reproduce the issue in Linux.
@Garve could you add the following lines at the top of your script to see if the issue persists:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = ""
I don't have Windows environment so it's hard to guess the issue. My best bet is due to this issue.
Hi! Sorry for the late answer. No, it still doesn't work, sadly.
Hi, I am also having this same problem, but I am on Ubuntu 18.04, not Windows. When I run with num_chains=1
, everything is fine, the progress bar fills up. When I run with num_chains=2
, it displays two empty progress bars (at least in jupyter lab), after waiting a little while a third progress bar pops up (using stderr), that immediately crashes the kernel:
Sometimes the third progress bar does not pop up at all and the script just hangs forever. I tried @fehiepsi 's cuda devices fix but that did not change anything.
@ecotner Could you paste part of error message in the console? It might give us some hints. Also, did you run mcmc
two times? IIRC there is a limitation (see also this topic) of using PyTorch multiprocessing in jupyter lab.
I'm getting similar behavior on Mac, Linux and Windows. The sampler just hangs when the progress bar appears for multiple chains. Single chains work fine. I have tried using CPU on all platforms and GPU on Linux.
Hi @fonnesbeck, I just installed a fresh pyro on a new conda environment on Linux. The topic model works for me in jupyterlab and jupyter notebook. But if I make a second mcmc run, I got [ERROR LOG CHAIN:0]Unable to handle autograd's threading in combination with fork-based multiprocessing. See https://github.com/pytorch/pytorch/wiki/Autograd-and-Fork
. This could be a hint I guess.
OK, thanks. Interestingly, I can get it going on Linux with GPU if I remove the mp_context="spawn"
flag that is recommended in the docstring. However, after 10 or so iterations on each chain the MCMC run really slows down, to the point where its actually much faster to run a single chain. You can see this in the screen capture below, which shows 2-chain and single-chain sampling rates being drastically different:
I guess it is expected if moving tensors in/out/across processes is costly. From our experience, making multiprocess works on PyTorch is quite tricky and sadly, we don't know what is the best practice to apply for MCMC (probably it is just a matter of changing a few lines of code to make MCMC run more efficiently). :(
Hi!
I tried to implement some very simple Bayesian Regression via NUTS/MCMC. It works well, if I use a single Markov Chain, however, when I increase the number, the program does not stop anymore (but also doesn't yield any error message).
If you set num_chains to 1, it will work.
Pyro 1.2.1 PyTorch 1.4.0+cpu Python 3.7.6 (tags/v3.7.6:43364a7ae0, Dec 19 2019, 00:42:30) [MSC v.1916 64 bit (AMD64)] Windows 10
Thanks for help!