optuna / optuna

A hyperparameter optimization framework
https://optuna.org
Other
10.09k stars 979 forks source link

Cache split in TPE for high-dimensional optimization #5464

Closed nabenabe0928 closed 3 weeks ago

nabenabe0928 commented 1 month ago

Motivation

As TPE significantly slows down for high-dimensional optimization, this PR introduces caching mechanism for the TPE split.

Description of the changes

For n_trials=1000 and dim=10, the runtimes are the following:

This PR Master
29 47
not522 commented 1 month ago

What is the relationship between this PR and #5454? Should we review it after #5454?

nabenabe0928 commented 1 month ago

@not522 These two PRs are orthogonal works, so we can separately work! This PR aims to share the split information in a trial. The other PR aims to share the information over multiple trials giving the same set of arguments to the hssp solver.

not522 commented 1 month ago

@eukaryo @gen740 Could you review this PR?

eukaryo commented 3 weeks ago

Sorry, I am temporarily busy because my primary computer is not working, and I suppose @HideakiImamura -san is the appropriate reviewer. @HideakiImamura could you review this PR?

codecov[bot] commented 3 weeks ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 89.74%. Comparing base (181d65f) to head (3b61edc). Report is 158 commits behind head on master.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #5464 +/- ## ========================================== + Coverage 89.52% 89.74% +0.22% ========================================== Files 194 195 +1 Lines 12626 12592 -34 ========================================== - Hits 11303 11301 -2 + Misses 1323 1291 -32 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

nabenabe0928 commented 3 weeks ago

We discussed internally and decided to close this PR. This issue can be more or less avoided by specifying multivariate=True.

nabenabe0928 commented 3 weeks ago

Just for future reminder, I will leave some comments:

This PR does not cause any issues between processes

As each trial is sampled in a thread and a sampling of a specific trial will not be scattered on multiple processes or threads, we do not have to be concerned about the missing cached data in another process.

However, when using multiple threads, we need to take care of another thread overwriting the cache before one trial completes its sampling. For this reason, I added a buffer to store the split data up to the latest 64 trials in a thread.

Anyways, missing cache does not cause any issue because we simply need to re-calculate the split, which incurs some more computational burden.