ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.63k stars 5.71k forks source link

[Train][Tune] Optuna integration significantly reduces performance and limits number of parallel trials in Ray Tune #46965

Open Aricept094 opened 2 months ago

Aricept094 commented 2 months ago

What happened + What you expected to happen

Bug: When using Optuna as the search algorithm in Ray Tune, the performance reduces significantly, CPU utilization decreases, and the number of trials is limited to 1.

Expected behavior: Optuna integration should maintain performance comparable to other search algorithms, utilize CPU resources efficiently, and allow for multiple trials as specified.

Versions / Dependencies

Ray version: 2.32.0 Python version: 3.11.8 Operating System: Linux-6.6.36.3-microsoft-standard-WSL2-x86_64-with-glibc2.35 Optuna version: 3.6.1

Detailed OS Information: OS: Linux OS Version: #1 SMP PREEMPT_DYNAMIC Sat Jun 29 07:01:04 UTC 2024 OS Release: 6.6.36.3-microsoft-standard-WSL2 Machine: x86_64 Processor: x86_64

Reproduction script

The execution time for 100 trials shows a dramatic difference:

With Optuna: 41 seconds Without Optuna: 9 seconds

import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import ray
from ray import tune
from ray.tune.search.optuna import OptunaSearch
from ray import train, tune

def generate_data(n_samples=1000):
    X = np.random.rand(n_samples, 5)
    y = 2 * X[:, 0] + 3 * X[:, 1] - X[:, 2] + 0.5 * X[:, 3] - 1.5 * X[:, 4] + np.random.normal(0, 0.1, n_samples)
    return X, y

def train_random_forest(config):
    X, y = generate_data()
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

    rf = RandomForestRegressor(
        n_estimators=config["n_estimators"],
    )

    rf.fit(X_train, y_train)
    y_pred = rf.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)

    train.report({"mean_squared_error": mse,})

def main():
    ray.init()

    config = {
        "n_estimators": tune.randint(10, 200),
    }

    analysis = tune.run(
        train_random_forest,
        config=config,
        num_samples=100, 
        search_alg=OptunaSearch(),
        metric="mean_squared_error",
        mode="min",
        reuse_actors=True
    )

    print("Best config:", analysis.best_config)
    print("Best MSE:", analysis.best_result["mean_squared_error"])

if __name__ == "__main__":
    main()

Issue Severity

High: It blocks me from completing my task.

Aricept094 commented 2 months ago

One of Optuna contributors , raised a possible explanation which might require some changes from Ray developers team .