kmzzhang / nbi

Package for Neural Posterior Estimation and Importance Sampling focused on Astronomical Applications
Other
33 stars 4 forks source link

Issue with numpy on Win10 #7

Open tjayasinghe opened 1 year ago

tjayasinghe commented 1 year ago

Ran into this issue trying to run the example notebook on Windows 10 + Anaconda. The code block that failed was the example on sequential inference.

numpy works perfectly fine if I run it elsewhere.

The notebook has no issues running on Debian through WSL, so I presume this is something to do with multiprocessing on windows.


RemoteTraceback Traceback (most recent call last) RemoteTraceback: """ Traceback (most recent call last): File "C:\Users\thari\Anaconda3\lib\site-packages\multiprocess\pool.py", line 121, in worker result = (True, func(*args, *kwds)) File "C:\Users\thari\Anaconda3\lib\site-packages\multiprocess\pool.py", line 44, in mapstar return list(map(args)) File "c:\users\thari\documents\research\nbi\win10\nbi\src\nbi\utils.py", line 22, in parallel_simulate simulation = simulator(params) File "", line 4, in sine NameError: name 'np' is not defined """

The above exception was the direct cause of the following exception:

NameError Traceback (most recent call last)

in 40 min_lr=0.0002, # learning rate decays from lr to min_lr at the last epoch 41 f_val=0.1, # fraction used as validation set ---> 42 early_stop_patience=5 # stop training if val loss not improve by 5 epochs 43 ) c:\users\thari\documents\research\nbi\win10\nbi\src\nbi\engine.py in run(self, obs, n_per_round, n_rounds, n_epochs, n_reuse, y_true, train_batch, val_batch, project, wandb_enabled, neff_stop, early_stop_train, early_stop_patience, f_val, lr, min_lr, x_file, y_file, decay_type, debug) 181 """ 182 if len(self.x_all) == self.round: --> 183 self.prepare_data(obs, n_per_round, y_true=y_true) 184 185 for i in range(n_rounds): c:\users\thari\documents\research\nbi\win10\nbi\src\nbi\engine.py in prepare_data(self, obs, n_per_round, y_true) 276 self.corner(obs, ys, y_true=y_true) 277 --> 278 x_path, good = self.simulate(ys) 279 np.save(os.path.join(self.directory, str(self.round)) + '_x.npy', x_path[good]) 280 np.save(os.path.join(self.directory, str(self.round)) + '_y.npy', ys[good]) c:\users\thari\documents\research\nbi\win10\nbi\src\nbi\engine.py in simulate(self, thetas) 508 509 with Pool(self.n_jobs) as p: --> 510 masks = p.map(parallel_simulate, jobs) 511 masks = np.concatenate(masks) 512 ~\Anaconda3\lib\site-packages\multiprocess\pool.py in map(self, func, iterable, chunksize) 266 in a list that is returned. 267 ''' --> 268 return self._map_async(func, iterable, mapstar, chunksize).get() 269 270 def starmap(self, func, iterable, chunksize=None): ~\Anaconda3\lib\site-packages\multiprocess\pool.py in get(self, timeout) 655 return self._value 656 else: --> 657 raise self._value 658 659 def _set(self, i, obj): NameError: name 'np' is not defined
profjsb commented 1 year ago

This looks like it could be related to a known issue with the process map in Windows. The recommendation is to try freeze_support(). See https://github.com/uqfoundation/multiprocess/issues/56#issuecomment-727576884 and https://github.com/uqfoundation/pathos/blob/d4c126bba354eea3c0ed4a8cc5148768cf616a85/examples/nested.py#L25

@tjayasinghe can you try to add this line and see if it fixes the Windows issue? We will try to build Windows OS into the testing suite.

profjsb commented 1 year ago

@kmzzhang you can see how to add windows os testing in github actions here https://github.com/cesium-ml/cesium/blob/dfe3d21eacfc1a242803cc2acf11a776e0a1526a/.github/workflows/wheel_tests_and_release.yml#L105

tjayasinghe commented 1 year ago

@profjsb Thanks! Initially, this did not work when I tried running it through the jupyter notebook.

@kmzzhang I copied the code and created a wrapper for the NBI code for sequential data, and added in freeze_support()

def seq_data():
    # nbi has pre-defined neural networks for sequential data
    resnet = get_featurizer('resnetrnn', 1, 32, depth=3)

    engine = NBI(
        resnet,
        dim_param=3,
        physics=sine,
        instrumental=noise,
        prior_sampler=prior,
        log_like=log_like,
        log_prior=log_prior,
        flow_config=flow_config,
        labels=labels,
        directory='test',
        n_jobs=10,         # for generating training set
        parallel=True,      # only useful if GPU available
        tqdm_notebook=True
    )

    engine.run(
        x_obs,
        y_true=y_true,
        n_rounds=2,
        n_per_round=5120,
        n_epochs=10,
        train_batch=64,
        val_batch=64,
        lr=0.0005,
        min_lr=0.0002,        # learning rate decays from lr to min_lr at the last epoch
        f_val=0.1,            # fraction used as validation set
        early_stop_patience=5 # stop training if val loss not improve by 5 epochs
    )

if __name__ ==  '__main__': 
    from pathos.helpers import freeze_support, shutdown
    freeze_support()
    seq_data()

This works perfectly if I run the code through the terminal. Only fails in jupyter!

profjsb commented 1 year ago

@tjayasinghe will you make a PR?

tjayasinghe commented 1 year ago

@profjsb This code will need to be added as an example for a user to follow if they want multiprocessing in Windows. I don't think I need to make changes to @kmzzhang's code.

If this sounds good, I will make a PR!