Open tjayasinghe opened 1 year ago
This looks like it could be related to a known issue with the process map in Windows. The recommendation is to try freeze_support(). See https://github.com/uqfoundation/multiprocess/issues/56#issuecomment-727576884 and https://github.com/uqfoundation/pathos/blob/d4c126bba354eea3c0ed4a8cc5148768cf616a85/examples/nested.py#L25
@tjayasinghe can you try to add this line and see if it fixes the Windows issue? We will try to build Windows OS into the testing suite.
@kmzzhang you can see how to add windows os testing in github actions here https://github.com/cesium-ml/cesium/blob/dfe3d21eacfc1a242803cc2acf11a776e0a1526a/.github/workflows/wheel_tests_and_release.yml#L105
@profjsb Thanks! Initially, this did not work when I tried running it through the jupyter notebook.
@kmzzhang I copied the code and created a wrapper for the NBI code for sequential data, and added in freeze_support()
def seq_data():
# nbi has pre-defined neural networks for sequential data
resnet = get_featurizer('resnetrnn', 1, 32, depth=3)
engine = NBI(
resnet,
dim_param=3,
physics=sine,
instrumental=noise,
prior_sampler=prior,
log_like=log_like,
log_prior=log_prior,
flow_config=flow_config,
labels=labels,
directory='test',
n_jobs=10, # for generating training set
parallel=True, # only useful if GPU available
tqdm_notebook=True
)
engine.run(
x_obs,
y_true=y_true,
n_rounds=2,
n_per_round=5120,
n_epochs=10,
train_batch=64,
val_batch=64,
lr=0.0005,
min_lr=0.0002, # learning rate decays from lr to min_lr at the last epoch
f_val=0.1, # fraction used as validation set
early_stop_patience=5 # stop training if val loss not improve by 5 epochs
)
if __name__ == '__main__':
from pathos.helpers import freeze_support, shutdown
freeze_support()
seq_data()
This works perfectly if I run the code through the terminal. Only fails in jupyter!
@tjayasinghe will you make a PR?
@profjsb This code will need to be added as an example for a user to follow if they want multiprocessing in Windows. I don't think I need to make changes to @kmzzhang's code.
If this sounds good, I will make a PR!
Ran into this issue trying to run the example notebook on Windows 10 + Anaconda. The code block that failed was the example on sequential inference.
numpy works perfectly fine if I run it elsewhere.
The notebook has no issues running on Debian through WSL, so I presume this is something to do with multiprocessing on windows.
RemoteTraceback Traceback (most recent call last) RemoteTraceback: """ Traceback (most recent call last): File "C:\Users\thari\Anaconda3\lib\site-packages\multiprocess\pool.py", line 121, in worker result = (True, func(*args, *kwds)) File "C:\Users\thari\Anaconda3\lib\site-packages\multiprocess\pool.py", line 44, in mapstar return list(map(args)) File "c:\users\thari\documents\research\nbi\win10\nbi\src\nbi\utils.py", line 22, in parallel_simulate simulation = simulator(params) File "", line 4, in sine
NameError: name 'np' is not defined
"""
The above exception was the direct cause of the following exception:
NameError Traceback (most recent call last)