Fitting ExplainableBoostingClassifier in new process causes long delay and zombie processes

interpretml / interpret

Fit interpretable models. Explain blackbox machine learning.

https://interpret.ml/docs

MIT License

6.13k stars 720 forks source link

Fitting ExplainableBoostingClassifier in new process causes long delay and zombie processes #557

Open jfleh opened 6 days ago

jfleh commented 6 days ago

import multiprocessing

from interpret.glassbox import ExplainableBoostingClassifier

X = [[1, 5], [2, 4], [3, 3], [4, 2], [5, 1]] y = [0, 1, 0, 1, 0]

def fit_ebm(x, y): ebm = ExplainableBoostingClassifier() ebm.fit(x, y)

fit_process = multiprocessing.Process(target=fit_ebm, args=(X,y)) fit_process.start() print("Waiting for join.") fit_process.join()

Running the code above (sorry for the formating) causes the fit_process to run for about 5 minutes after which it joins but I can observe zombie processes. Only happens with n_jobs other than 1, so it is related to the execution with joblib.Parallel.

paulbkoch commented 6 days ago

I'm not innately familiar with the deep internals of joblib, but I thought it created a pool of child processes that it kept alive after your job completes, so this might be expected behavior?

jfleh commented 6 days ago

I am also not very familiar with joblib, it might be expected behavior for joblib.Parallel which is used by the ebm for parallelisation, however it was not expected behavior (for me) that fitting an EBM in a new process causes this delay and most of all the zombie processes. If you run the code that I posted and exchange the ebm for a model from sklearn that also has the n_jobs parameter, the fit will finish without delays and zombie processes. Sklearn also uses joblib for parallelisation but they seem to have some additional steps using joblib.effective_n_jobs (see for example here), maybe this is the key difference?