ray-project / tune-sklearn

A drop-in replacement for Scikit-Learn’s GridSearchCV / RandomizedSearchCV -- but with cutting edge hyperparameter tuning techniques.
https://docs.ray.io/en/master/tune/api_docs/sklearn.html
Apache License 2.0
465 stars 52 forks source link

ValueError: Tune-sklearn no longer supports nested parallelism with new versions of joblib/sklearn. Don't set 'sk_n_jobs'. #221

Closed RNarayan73 closed 2 years ago

RNarayan73 commented 3 years ago

Hello,

I'm trying to use TuneSearchCV in a pipeline nested within the cross_validate function. As per sklearn's recommendation this will enable a more robust evaluation of the generalisation error. Note, this isn't a cross validation for hyperparameter optimisation within TuneSearchCV, but of the overall pipeline which includes TuneSearchCV as the last step. image

I get the above error when I execute cross_validate even though sk_n_jobs is left as the default value (None). Furthermore, when I explicitly set sk_n_jobs to 1 (in the hope of avoiding parallelism) I get the same error earlier, when I try to instantiate TuneSearchCV.

How can I avoid this and run cross_validation? I have managed to test models this way using GridSearchCV and the SKOpt, Optuna optimisers with their SKLearn wrapper, but am stuck when trying it with TuneSearchCV.

Thanks for your help.

Narayan

Yard1 commented 3 years ago

Are you using the latest release of tune-sklearn (0.4.1)? Can you show how you initialise TuneSearchCV?

RNarayan73 commented 3 years ago

Yes, on 0.4.1 Here's the code:

hpt_pipe = TuneSearchCV(pipe, 
                                param_distributions=get_params_space(algo, hpt), 
                                cv=cv_skf,   # cv object
                                scoring=score_metrics,    # dict with scoring metrics
                                refit=eval_metric, 
                                name=hpt + '-' + run, 
                                #early_stopping=False, 
                                pipeline_auto_early_stop=True,  # If True, early stopping will be performed on the last stage of the pipeline (which must support early stopping)
                                                                # If False, early stopping will be determined by ‘Pipeline.warm_start’ or ‘Pipeline.partial_fit’ capabilities, which are by default not supported by standard SKlearn
                                local_dir=TUNE_FOLDER, 
                                n_trials=ITERS, random_state=SEED, 
                                #sk_n_jobs=None,   # Tune-sklearn no longer supports nested parallelism with new versions of joblib/sklearn. Don't set 'sk_n_jobs'
                                n_jobs=JOBS, verbose=1, 
                                **kwargs,
                               )

The above pipeline works fine when I run it through a standard train-test-split validation.

Hope this helps

Narayan

Yard1 commented 3 years ago

Is it possible that sk_n_jobs is somehow passed through kwargs?

RNarayan73 commented 3 years ago

Unlikely, Here's the call to a helper function get_pipe that returns a pipeline for an algorithm and HP tuning (*SearchCV) method within which contains the above code. tune_pipe, _ = get_pipe(algo=ALGO, hpt=HPT, val=VAL, score_metrics=score_metrics, eval_metric=eval_metric, mem=mem, run=RUN, persist=PERSIST, search_optimization=search, early_stopping=early) #TUNE_SEARCH[search_algo])

Here's the values for the HPs that I get by running tune_pipe.get_params()

{'cv': StratifiedKFold(n_splits=5, random_state=0, shuffle=True),
 'early_stopping': False,
 'error_score': nan,
 *** estimator parameters truncated here ***
  'local_dir': './_tune/',
 'loggers': [ray.tune.logger.JsonLogger, ray.tune.logger.CSVLogger],
 'max_iters': 1,
 'mode': 'max',
 'n_jobs': -1,
 'n_trials': 50,
 'name': 'tune-setup',
 'param_distributions': {'clf__alpha': <ray.tune.sample.Float at 0x200ad96dd00>},
 'pipeline_auto_early_stop': True,
 'random_state': 0,
 'refit': 'avg_prec',
 'return_train_score': False,
 'scoring': {'profit_ratio': make_scorer(profit_ratio_score),
  'precision': make_scorer(precision_score, average=binary),
  'f_0.5': make_scorer(fbeta_score, beta=0.5, pos_label=1),
  'avg_prec': make_scorer(average_precision_score, needs_proba=True, pos_label=1),
  'brier_rel': make_scorer(brier_rel_score, needs_proba=True, pos_label=1)},
 'search_optimization': 'bayesian',
 'sk_n_jobs': 1,
 'stopper': None,
 'time_budget_s': None,
 'use_gpu': False,
 'verbose': 1}
Yard1 commented 3 years ago

'sk_n_jobs': 1, is there. Can you remove it?

RNarayan73 commented 3 years ago

No, I can't. It seems the sk_n_jobs default value of 'None' is being translated as 1 during initialisation, even though I don't explicitly assign it any value. Notice it is commented out when initialising TuneSearchCV. The error comes up when I try to run cross_validate. If I try to explicitly set it to 1, then the error occurs earlier, when initialising TuneSearchCV.

Yard1 commented 3 years ago

Could you do pip uninstall -y tune-sklearn && pip install -U git+https://github.com/Yard1/tune-sklearn.git@fix_sk_n_jobs and try again?

RNarayan73 commented 3 years ago

Good! That resolved the sk_n_jobs issue.

However, now I get another error:

RuntimeError: Cannot clone object TuneSearchCV(cv=StratifiedKFold(n_splits=5, random_state=0, shuffle=True), early_stopping=False, estimator=Pipeline(steps=[('enc', ColumnTransformer(sparse_threshold=0, transformers=[('drop', 'drop', ['WeekBegin', 'AutoML_split']), ('numeric', 'passthrough', ['RSI', 'h1RSI', 'h2RSI']), ('target', MEstimateEncoder(random_state=0), ['SignalTime', 'Symbol', 'CandleType'... random_state=0, refit='avg_prec', scoring={'avg_prec': make_scorer(average_precision_score, needs_proba=True, pos_label=1), 'brier_rel': make_scorer(brier_rel_score, needs_proba=True, pos_label=1), 'f_0.5': make_scorer(fbeta_score, beta=0.5, pos_label=1), 'precision': make_scorer(precision_score, average=binary), 'profit_ratio': make_scorer(profit_ratio_score)}, sk_n_jobs=1, verbose=1), as the constructor either does not set or modifies parameter loggers

Should I raise another issue?

Thanks Narayan

Yard1 commented 3 years ago

I will look into it, thanks! This issue is fine

RNarayan73 commented 3 years ago

Thank you very much, @Yard1. I look forward to your fix.

Narayan

RNarayan73 commented 3 years ago

@Yard1 please let me know if you have something that you'd like me to test. I'd be happy to assist.

Yard1 commented 3 years ago

Sorry, I have been occupied with other matters. I'll get back to tune-sklearn issues this week. Will keep you updated!

Yard1 commented 3 years ago

Hey @RNarayan73 I've pushed an update to my branch. Can you install from it and try again? You may get an error regarding kwargs, in which case, just do what it says (move them to a dict and pass them as search_kwargs argument).

RNarayan73 commented 3 years ago

Thanks @Yard1. I'm new to git, so just wanted to make sure, do I install as follows? pip uninstall -y tune-sklearn && pip install -U git+https://github.com/Yard1/tune-sklearn.git

Narayan

Yard1 commented 3 years ago

pip uninstall -y tune-sklearn && pip install -U git+https://github.com/Yard1/tune-sklearn.git@fix_sk_n_jobs will do the trick! (the branch is different than master)

RNarayan73 commented 3 years ago

Hi @Yard1 I tried this new version and the same error persists:

"RuntimeError: Cannot clone object TuneSearchCV(...), as the constructor either does not set or modifies parameter loggers"

Narayan

Yard1 commented 3 years ago

Are you not getting an error when initializing?

RNarayan73 commented 3 years ago

No, I never had an error when initializing. It was always a runtime error when the object is being cloned. In fact, the simplest way to replicate the error is to try to clone an instance of TuneSearchCV e.g: test_tune = TuneSearchCV(SGDClassifier(), {'alpha': tune.loguniform(1e-4, 1e0, 4)}) clone(test_tune)

Yard1 commented 3 years ago

Yes, but the changes I introduced should cause it to error on initialisation if you provide keyword arguments. Thanks, I'll check that out.

RNarayan73 commented 3 years ago

OK. When I ran the above statements, the first statement to initialise test_tune ran fine and it failed on clone with the above runtime error. My versions of ray-core and ray-tune are: ray-core, ray tune == 1.6.0 tune-sklearn == your fix version

Hope this helps Narayan

Yard1 commented 3 years ago

Thanks, I just pushed a new commit. Can you try now? Same command as before. @RNarayan73

RNarayan73 commented 3 years ago

Thanks! The error has now moved from the parameters: loggers to scoring:

"RuntimeError: Cannot clone object TuneSearchCV(estimator=SGDClassifier(), mode='max', n_jobs=-1, param_distributions={'alpha': <ray.tune.sample.Float object at 0x0000027A21AD69A0>}, scoring={'score': <function _passthrough_scorer at 0x0000027A1B4F65E0>}, sk_n_jobs=1), as the constructor either does not set or modifies parameter scoring"

Narayan

Yard1 commented 3 years ago

Haha, thanks! I pushed one last commit, this should take care of everything. @RNarayan73

RNarayan73 commented 3 years ago

That worked! Well done, @Yard1 Are you intending to include this in the next release? Thanks Narayan

Yard1 commented 3 years ago

Yes! Glad to hear it worked.

RNarayan73 commented 3 years ago

@Yard1 Having got search_optimization 'bohb' to work again, I discovered that the cloning error occurs when running TuneSearchCV with search_optimization = 'bohb' and early_stopping = 'True' through cross_validate However, bohb works fine if I set early_stopping = 'HyperBandForBOHB', but takes much longer to run.

When running it this way, it also generates loads of errors as below:

Unhandled error (suppress with RAY_IGNORE_UNHANDLED_ERRORS=1): ray::_PipelineTrainable.save_to_object() (pid=9696, ip=192.168.0.16, repr=<tune_sklearn._trainable._PipelineTrainable object at 0x000001F0458E86D0>) File "python\ray_raylet.pyx", line 536, in ray._raylet.execute_task File "python\ray_raylet.pyx", line 486, in ray._raylet.execute_task.function_executor File "C:\Anaconda3\envs\Scikit-Learn\lib\site-packages\ray_private\function_manager.py", line 563, in actor_method_executor return method(ray_actor, *args, **kwargs) File "C:\Anaconda3\envs\Scikit-Learn\lib\site-packages\ray\tune\trainable.py", line 349, in save_to_object checkpoint_path = self.save(tmpdir) File "C:\Anaconda3\envs\Scikit-Learn\lib\site-packages\ray\tune\trainable.py", line 330, in save checkpoint_dir = TrainableUtil.make_checkpoint_dir( File "C:\Anaconda3\envs\Scikit-Learn\lib\site-packages\ray\tune\utils\trainable.py", line 124, in make_checkpoint_dir open(os.path.join(checkpoint_dir, ".is_checkpoint"), "a").close() FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\naray\Documents\Jupyter\_raytune\ray_setup\_PipelineTrainable_ae18f3e5_46_X_id=ObjectRef(ffffffffffffffffffffffffffffffffffffffff010000000f000000),clfalpha=0.38384,cv=Stra_2021-10-18_01-24-09\tmpyq_s5wj3save_to_object\checkpoint_000001\.is_checkpoint

However, when I run TuneSearchCV with search_optimization = 'bohb' and early_stopping = 'True' using a normal train-test-split and .fit(X_train, y_train), it works absolutely fine without either of the above errors.

Is this a related issue? Narayan

Yard1 commented 3 years ago

Will look into it, thanks!

RNarayan73 commented 3 years ago

@Yard1 Has this been fixed? How can I get the update? Narayan

Yard1 commented 3 years ago

Sorry, closed it by mistake. Hadn't looked into it just yet.

Yard1 commented 3 years ago

Hey @RNarayan73 are you running this on Windows?

RNarayan73 commented 3 years ago

Yes, Windows 10.

The file naming issue seems to be a Windows issue because the folder names have a long parameter list!

Yard1 commented 3 years ago

Yeah, that would be it. I think we have fixed it on master, but I will double check. In the meantime, can you install the nightly version of Ray and try again?

Yard1 commented 3 years ago

There is an environment variable responsible for the directory name length, TUNE_MAX_LEN_IDENTIFIER. By default it is set to 130, but the entire path may be too long. Can you set that env var to a smaller number and see if that helps?

RNarayan73 commented 2 years ago

Sure, I'll try these.

@Yard1 But I think, the cloning issue with search_optimization = bohb error when early_stopping = True is not directly related to this. The directory issues only come up when I try to work around the bohb - True issue by using early_stopping= 'HyperBandForBOHB'. This takes much longer to run than normal too.

Narayan

RNarayan73 commented 2 years ago

@Yard1 Have you got rid of the fix_sk_n_jobs branch? How can I install it? Narayan

Yard1 commented 2 years ago

@RNarayan73 the changes from that branch have been merged to master, just replace the branch name in the installation command with master.

RNarayan73 commented 2 years ago

I see. Thanks @Yard1

On installing the master using pip install -U git+https://github.com/Yard1/tune-sklearn.git@master I noticed that the installed version is tune-sklearn-0.0.6

When trying to instantiate a TuneSearchCV object, I get the following errors:

  1. I used to be able to pass either one of the strings: 'random', 'bayesian', 'hyperopt', 'optuna', 'bohb' to use ray's standard search algorithms. Alternatively, I could pass a search object such as HyperOptSearch(), BayesOptSearch(), NevergradSearch() etc. with additional parameters such as seed to ensure reproduceability. I had managed to get them all to work (apart from the outstanding issue with cloning bohb described earlier in this thread). Now I get the error:

    ValueError: Search optimization must be random or bayesian

  2. Also, now TuneSearchCV doesn't seem to like the native ray tune search space distribution APIs when used with 'bayesian'. It was earlier able to seamlessly convert the ray distributions to the appropriate skopt distributions. The error is:

    ValueError: distribution must be a tuple, list, or skopt.space.Dimension instance when using bayesian search

  3. After correcting for this, with distributions using the skopt api, when I run the .fit() method on the TuneSearchCV object, it throws up the following error suggesting that the multimetric feature that had worked perfectly well earlier is now broken:

    ValueError: For evaluating multiple scores, use sklearn.model_selection.cross_validate instead. {'profit_ratio': make_scorer(profit_ratio_score), 'precision': 'precision', 'f_0.5': make_scorer(fbeta_score, beta=0.5, pos_label=1), 'avg_prec': make_scorer(average_precision_score, needs_proba=True, pos_label=1), 'log_loss': make_scorer(log_loss, greater_is_better=False, needs_proba=True), 'brier_rel': make_scorer(brier_rel_score, needs_proba=True, pos_label=1)} was passed.

  4. Finally, when I change my scoring parameter to be a single metric and run it, I get the following error:

    RuntimeError: Trying to sample a configuration from SkOptSearch, but the metric (average_test_score) or mode (None) parameters have not been set. Either pass these arguments when instantiating the search algorithm, or pass them to tune.run().

Would you be able to reinstate the fix_sk_n_jobs branch with the previous functionality that worked fairly well except for the bohb issue described earlier in this thread, please?

Regards Narayan

Yard1 commented 2 years ago

@RNarayan73 Oh, so sorry, it should have been pip install -U git+https://github.com/ray-project/tune-sklearn.git@master. The master branch on my fork is very outdated. Can you try that?

RNarayan73 commented 2 years ago

@Yard1, thanks. I tried it and it seems to simply install the official release 0.4.1 and works the same with the error:

ValueError: Tune-sklearn no longer supports nested parallelism with new versions of joblib/sklearn. Don't set 'sk_n_jobs'.

I also tried uninstalling the official version and reinstalling using the above command and got the same result.

Narayan

Yard1 commented 2 years ago

Hey @RNarayan73 would it be possible to schedule a quick meeting between ourselves? I'd be happy to take a look

RNarayan73 commented 2 years ago

Actually, @Yard1 I tried again and it works! Must have been something in my environment So, is this now in the official release 0.4.1?

The 'bohb' + True scenario I described earlier still doesn't work. I'm happy to have a call to walk you through that if you like. I'm somewhat flexible with my time during work hours, so let me know if you have any time in mind.

Regards, Narayan

Yard1 commented 2 years ago

No, not yet, but it will be in 0.4.2 (or whatever the next version after 0.4.1 will be). I will be investigating the second scenario soon - will let you know if we still need to meet after that. Thanks!

RNarayan73 commented 2 years ago

@Yard1 That's fine. Let me know if you have an update? I'd be happy to help with testing it. Regards Narayan

Yard1 commented 2 years ago

Hey, sorry for delay. I am hoping to take a look at this this week.

Yard1 commented 2 years ago

@RNarayan73 Sorry for taking so long. I have opened a PR to fix this. You can test it by pip uninstall -y tune-sklearn && pip install -U git+https://github.com/Yard1/tune-sklearn.git@fix_early_stopping_cloning

RNarayan73 commented 2 years ago

@Yard1 Thanks for the udpate. I'm getting back in after a break.

I saw #229 and noted that fix_early_stopping_cloning has been deleted and merged into ray-project:master

So, I'm guessing that the install command is now pip uninstall -y tune-sklearn && pip install -U git+https://github.com/ray-project/tune-sklearn.git@master

Regards Narayan

Yard1 commented 2 years ago

Yes, that would be it :) @RNarayan73