trevorstephens / gplearn

Genetic Programming in Python, with a scikit-learn inspired API
http://gplearn.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
1.63k stars 284 forks source link

UserWarning: Loky-backed parallel loops cannot be called in a multiprocessing, setting n_jobs=1 #215

Open manuel-masiello opened 3 years ago

manuel-masiello commented 3 years ago

Hello, Thank you for your very good libs :-) In association with Celery Task Queue and MongoDB it is pure happiness!

Describe the bug

When I use the parameter n_jobs = 10, I get a warning message from joblib and the job is only done in one thread. I think it's related to using Celery but I can't figure out how to fix the problem.

Expected behavior

I would like to be able to parallelize the calculation on my 12-core processor.

Actual behavior

[2020-12-31 13:22:33,736: WARNING/ForkPoolWorker-1] |   Population Average    |             Best Individual              |
[2020-12-31 13:22:33,736: WARNING/ForkPoolWorker-1] ---- ------------------------- ------------------------------------------ ----------
[2020-12-31 13:22:33,736: WARNING/ForkPoolWorker-1] Gen   Length          Fitness   Length          Fitness      OOB Fitness  Time Left
[2020-12-31 13:22:33,737: WARNING/ForkPoolWorker-1] /home/user/works/project/venv/lib/python3.8/site-packages/joblib/parallel.py:733: UserWarning: Loky-backed parallel loops cannot be called in a multiprocessing, setting n_jobs=1

Steps to reproduce the behavior

from gplearn.genetic import SymbolicRegressor
from celery import Celery

import pickle
import codecs

CELERY_APP = 'process'
CELERY_BACKEND = 'mongodb://localhost:27017/tasks-results'
CELERY_BROKER = 'mongodb://localhost:27017/tasks-broker'

appCelery = Celery(CELERY_APP, backend=CELERY_BACKEND, broker=CELERY_BROKER)

def getCeleryBackend():
    return appCelery.backend

def encodeObjLearn(objLearn):
    return codecs.encode(pickle.dumps(objLearn), "base64").decode()

def decodeObjLearn(sLearn):
    return pickle.loads(codecs.decode(sLearn.encode(), "base64"))

@appCelery.task(name='capture.tasks.TaskSymbolicRegressor')
def TaskSymbolicRegressor(X_train, y_train):

    est_gp = SymbolicRegressor(population_size=10000, n_jobs=10,
                               generations=100, stopping_criteria=0.01,
                               p_crossover=0.7, p_subtree_mutation=0.1,
                               p_hoist_mutation=0.05, p_point_mutation=0.1,
                               max_samples=0.9, verbose=1,
                               parsimony_coefficient=0.01, random_state=0)
    est_gp.fit(X_train, y_train)

    delattr(est_gp, '_programs')
    return encodeObjLearn(est_gp)

System information

Linux-5.4.0-58-generic-x86_64-with-glibc2.29 Python 3.8.5 (default, Jul 28 2020, 12:59:40) [GCC 9.3.0] NumPy 1.19.2 SciPy 1.5.4 Scikit-Learn 0.24.0 Joblib 1.0.0 gplearn 0.4.1

trevorstephens commented 3 years ago

I'm not familiar with how celery works, but joblib will do all the parallelisation under the hood, you just need to set n_job when initialising the estimator. Is this something you would expect to work with, say, a random forest in scikit learn?

manuel-masiello commented 3 years ago

Hello and happy new year :-)

Thank you for this quick response. I just did a test with Random Forest with n_jobs = 10. It seems to work without problems:

@appCelery.task(name='capture.tasks.TaskRandomForestRegressor')
def TaskRandomForestRegressor(X_train, y_train):
    est_rf = RandomForestRegressor(n_jobs=10)
    est_rf.fit(X_train, y_train)

    return encodeObjLearn(est_rf)

return:

[2021-01-04 08:47:42,206: INFO/MainProcess] Received task: capture.tasks.TaskRandomForestRegressor[011a5d09-6a51-45b4-9ef0-27f5277fe932]  
[2021-01-04 08:47:42,386: INFO/ForkPoolWorker-1] Task capture.tasks.TaskRandomForestRegressor[011a5d09-6a51-45b4-9ef0-27f5277fe932] succeeded in 0.17795764410402626s:

I found an answer to this error but it requires a lib change and I'm not sure it works :

https://github.com/joblib/joblib/issues/978

JeffQuantFin commented 1 year ago

how can I apply multi_process on gplearn SymbolicTransformer?

It seems that gplearn support multi_thread by setting n_jobs=10.

Can we run it on multi process,which is even faster? How to do that ?

thx!

https://github.com/joblib/joblib/issues/978