kearnz / autoimpute

Python package for Imputation Methods
MIT License
237 stars 19 forks source link

Process fails when multiprocessing #80

Closed AnotherSamWilson closed 1 year ago

AnotherSamWilson commented 1 year ago

Hi,

I'm a little at a loss for how to diagnose this. Using a conda environment and a normal python virtual environment both result in the same error.

from sklearn.datasets import load_iris
import pandas as pd
import numpy as np
import miceforest as mf

random_state = np.random.RandomState(5)
iris = pd.concat(load_iris(return_X_y=True, as_frame=True), axis=1)
iris["target"] = iris["target"].astype("category")

iris.columns = [c.replace(" ", "") for c in iris.columns]
iris_amp = mf.utils.ampute_data(iris, perc=0.20)

from autoimpute.imputations import SingleImputer
si = SingleImputer()
si.fit(iris_amp)
si.transform(iris_amp)

Results in the following print out, and eventual error:

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [alpha, beta, σ]
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\swilson\anaconda3\envs\impcomp\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\swilson\anaconda3\envs\impcomp\lib\multiprocessing\spawn.py", line 125, in _main
    prepare(preparation_data)
  File "C:\Users\swilson\anaconda3\envs\impcomp\lib\multiprocessing\spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\swilson\anaconda3\envs\impcomp\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "C:\Users\swilson\anaconda3\envs\impcomp\lib\runpy.py", line 264, in run_path
    code, fname = _get_code_from_file(run_name, path_name)
  File "C:\Users\swilson\anaconda3\envs\impcomp\lib\runpy.py", line 234, in _get_code_from_file
    with io.open_code(decoded_path) as f:
OSError: [Errno 22] Invalid argument: 'C:\\Users\\swilson\\Projects\\impute_comparison\\<input>'

It looks like a problem with the multiprocessing module, I'm wondering if you have any quick hints for what I could be doing wrong.

kearnz commented 1 year ago

It looks like you're on Windows OS. I haven't kept up with multiprocessing in Windows, but historically you needed to hack a few things to get multiprocessing to work because Windows doesn't support the fork() method (hence the use of spawn() being called in your traceback)

Specifically, you need to make sure you're running code from the main module (i.e. run whatever you're trying to run inside an if __name__ == '__main__' block.)

Let me know if that helps. Do note I haven't really tested the latest version of autoimpute on Windows...

AnotherSamWilson commented 1 year ago

Ahhh that's probably it, I've been running interactively. I'll try this when I get home, thanks for the help!

kearnz commented 1 year ago

great - feel free to open another issue if something new comes up. will leave this one open for today then close it