Open JasperHG90 opened 1 year ago
What happened + What you expected to happen
I’m using Ray on a Databricks cluster. I’m trying to run Tune with the built-in Mlflow tracking server. I’m following the instructions here to achieve this (using ‘setup_mlflow’ approach).
I expect ray to be able to connect to the MLFlow tracking server. Instead, I get the following error:
raise InvalidConfigurationError.for_profile(None) databricks_cli.utils.InvalidConfigurationError: You haven't configured the CLI yet! Please configure by entering `/local_disk0/.ephemeral_nfs/envs/pythonEnv-c299d8bb-1817-4e6c-9b78-edc9ceb94e01/lib/python3.9/site-packages/ray/_private/workers/default_worker.py configure
I’ve tried to add the ‘tracking_token’ as an input as well, but to no avail.
Using the callback method works, but is not satisfactory as I cannot control what is logged.
Versions / Dependencies
Databricks: DBR 2.12 LTS ML; spark 3.3.2; scala 2.12 Ray: 2.4.0 Python: 3.9 MLFlow: 2.1.1
Reproduction script
I cannot give a reproducible example right now, but will prepare one later when I’m back from holiday.
def objective(config: dict, X: pd.DataFrame, y: pd.DataFrame): mlflow = setup_mlflow(config) mlflow.sklearn.autolog(log_models=True) model = HistGradientBoostingRegressor(**config) pipeline = _create_pipeline(model) cross_validated = cross_val_score( pipeline, X, y, scoring="neg_mean_squared_error", cv=10 ) return {"rmse": np.mean(np.sqrt(cross_validated * -1))} search_space = { "learning_rate": tune.loguniform(1e-4, 0.3), "max_depth": tune.choice([3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 18, 20]), "l2_regularization": tune.uniform(0, 1), "warm_start": tune.choice([True, False]), "max_iter": tune.choice([100, 200, 300, 400]), "mlflow": { "experiment_name":"/Users/EMAIL/regressor", "tracking_uri": mlflow.get_tracking_uri(), } } algo = HyperOptSearch( metric="rmse", mode="min" ) tuner = tune.Tuner( trainable=tune.with_parameters( objective, X=df_X.loc[:, NUM_FEATURES + CAT_FEATURES], y=df_y.loc[:, [TARGET]] ), param_space=search_space, tune_config=tune.TuneConfig( num_samples=1, search_alg=algo, metric="rmse" ), run_config=RunConfig( name="Regressor", ) )
Issue Severity
Medium: It is a significant difficulty but I can work around it.
I am having the same issue. Can you share your workaround please?
I fixed this by following this page https://docs.ray.io/en/latest/train/user-guides/experiment-tracking.html#set-up-credentials especially the section "Set up credentials".
What happened + What you expected to happen
I’m using Ray on a Databricks cluster. I’m trying to run Tune with the built-in Mlflow tracking server. I’m following the instructions here to achieve this (using ‘setup_mlflow’ approach).
I expect ray to be able to connect to the MLFlow tracking server. Instead, I get the following error:
I’ve tried to add the ‘tracking_token’ as an input as well, but to no avail.
Using the callback method works, but is not satisfactory as I cannot control what is logged.
Versions / Dependencies
Databricks: DBR 2.12 LTS ML; spark 3.3.2; scala 2.12 Ray: 2.4.0 Python: 3.9 MLFlow: 2.1.1
Reproduction script
I cannot give a reproducible example right now, but will prepare one later when I’m back from holiday.
Issue Severity
Medium: It is a significant difficulty but I can work around it.