mlflow / mlflow

Open source platform for the machine learning lifecycle
https://mlflow.org
Apache License 2.0
18.64k stars 4.22k forks source link

[BUG] Unity catalog registered models does not work with named profiles #10760

Open rshanker779 opened 10 months ago

rshanker779 commented 10 months ago

Issues Policy acknowledgement

Where did you encounter this bug?

Local machine

Willingness to contribute

Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.

MLflow version

poetry show mlflow-skinny output:

 name         : mlflow-skinny                                               
 version      : 2.8.0                                                       
 description  : MLflow: A Platform for ML Development and Productionization 

dependencies
 - click >=7.0,<9
 - cloudpickle <3
 - databricks-cli >=0.8.7,<1
 - entrypoints <1
 - gitpython >=2.1.0,<4
 - importlib-metadata >=3.7.0,<4.7.0 || >4.7.0,<7
 - packaging <24
 - protobuf >=3.12.0,<5
 - pytz <2024
 - pyyaml >=5.1,<7
 - requests >=2.17.3,<3
 - sqlparse >=0.4.0,<1

System information

Describe the problem

Using an mlflow client with a tracking URI pointing to a named profile and the registry UI any call that connects to the workspace fails with an authentication error.

I have the env variable DATABRICKS_CONFIG_FILE set, but not DATABRICKS_HOST or DATABRICKS_TOKEN

import mlflow
client = mlflow.MlflowClient(tracking_uri="databricks://<my workspace>", registry_uri="databricks-uc")
client.search_registered_models() # All calls will fail here, this is just an example.

this gives the following output

Traceback (most recent call last):
  File "/home/rohan/Documents/dk_datasci_model_pipeline/scripts/check_mlflow_connection.py", line 11, in <module>
    client.search_registered_models()
  File "/home/rohan/.cache/pypoetry/virtualenvs/dk-datasci-model-pipeline-fXF56sVU-py3.10/lib/python3.10/site-packages/mlflow/tracking/client.py", line 2372, in search_registered_models
    return self._get_registry_client().search_registered_models(
  File "/home/rohan/.cache/pypoetry/virtualenvs/dk-datasci-model-pipeline-fXF56sVU-py3.10/lib/python3.10/site-packages/mlflow/tracking/_model_registry/client.py", line 115, in search_registered_models
    return self.store.search_registered_models(filter_string, max_results, order_by, page_token)
  File "/home/rohan/.cache/pypoetry/virtualenvs/dk-datasci-model-pipeline-fXF56sVU-py3.10/lib/python3.10/site-packages/mlflow/store/_unity_catalog/registry/rest_store.py", line 300, in search_registered_models
    response_proto = self._call_endpoint(SearchRegisteredModelsRequest, req_body)
  File "/home/rohan/.cache/pypoetry/virtualenvs/dk-datasci-model-pipeline-fXF56sVU-py3.10/lib/python3.10/site-packages/mlflow/store/model_registry/base_rest_store.py", line 43, in _call_endpoint
    self.get_host_creds(), endpoint, method, json_body, response_proto, extra_headers
  File "/home/rohan/.cache/pypoetry/virtualenvs/dk-datasci-model-pipeline-fXF56sVU-py3.10/lib/python3.10/site-packages/mlflow/utils/databricks_utils.py", line 439, in get_databricks_host_creds
    config = provider.get_config()
  File "/home/rohan/.cache/pypoetry/virtualenvs/dk-datasci-model-pipeline-fXF56sVU-py3.10/lib/python3.10/site-packages/databricks_cli/configure/provider.py", line 134, in get_config
    raise InvalidConfigurationError.for_profile(None)
databricks_cli.utils.InvalidConfigurationError: You haven't configured the CLI yet! Please configure by entering `/home/rohan/Documents/dk_datasci_model_pipeline/scripts/check_mlflow_connection.py configure`

My config file has a section for the given workspace, and the credentials are valid. If I adjust the authentication to set DATABRICKS_HOST and DATABRICKS_TOKEN instead, everything works fine

import configparser
parser = configparser.ConfigParser()
parser.read(os.getenv("DATABRICKS_CONFIG_FILE"))
os.environ["DATABRICKS_HOST"] = parser["data-science-dev"]["host"]
os.environ["DATABRICKS_TOKEN"] = parser["data-science-dev"]["token"]
client = mlflow.MlflowClient(tracking_uri="databricks://data-science-dev", registry_uri="databricks-uc")
client.search_registered_models()

Tracking information

System information: Linux #1 SMP Thu Oct 5 21:02:42 UTC 2023
Python version: 3.10.12
MLflow version: 2.9.0
MLflow module location: /home/rohan/.cache/pypoetry/virtualenvs/dk-datasci-model-pipeline-fXF56sVU-py3.10/lib/python3.10/site-packages/mlflow/__init__.py
Tracking URI: file:///home/rohan/Documents/dk_datasci_model_pipeline/scripts/mlruns
Registry URI: file:///home/rohan/Documents/dk_datasci_model_pipeline/scripts/mlruns
MLflow dependencies: 
  Flask: 2.3.3
  Jinja2: 3.1.2
  aiohttp: 3.9.1
  alembic: 1.12.1
  azure-storage-file-datalake: 12.14.0
  boto3: 1.33.8
  click: 8.1.7
  cloudpickle: 3.0.0
  databricks-cli: 0.17.8
  docker: 6.1.3
  entrypoints: 0.4
  gitpython: 3.1.40
  google-cloud-storage: 2.13.0
  gunicorn: 21.2.0
  importlib-metadata: 7.0.0
  markdown: 3.5.1
  matplotlib: 3.8.1
  numpy: 1.26.2
  packaging: 23.2
  pandas: 1.5.3
  protobuf: 4.25.1
  psutil: 5.9.6
  pyarrow: 13.0.0
  pydantic: 1.10.13
  pytz: 2023.3.post1
  pyyaml: 6.0.1
  querystring-parser: 1.2.4
  requests: 2.31.0
  scikit-learn: 1.3.2
  scipy: 1.11.4
  sqlalchemy: 2.0.23
  sqlparse: 0.4.4
  virtualenv: 20.25.0

Code to reproduce issue

import mlflow
client = mlflow.MlflowClient(tracking_uri="databricks://<my workspace>", registry_uri="databricks-uc")
client.search_registered_models() # All calls will fail here, this is just an example.

Stack trace

Traceback (most recent call last):
  File "/home/rohan/Documents/dk_datasci_model_pipeline/scripts/check_mlflow_connection.py", line 11, in <module>
    client.search_registered_models()
  File "/home/rohan/.cache/pypoetry/virtualenvs/dk-datasci-model-pipeline-fXF56sVU-py3.10/lib/python3.10/site-packages/mlflow/tracking/client.py", line 2372, in search_registered_models
    return self._get_registry_client().search_registered_models(
  File "/home/rohan/.cache/pypoetry/virtualenvs/dk-datasci-model-pipeline-fXF56sVU-py3.10/lib/python3.10/site-packages/mlflow/tracking/_model_registry/client.py", line 115, in search_registered_models
    return self.store.search_registered_models(filter_string, max_results, order_by, page_token)
  File "/home/rohan/.cache/pypoetry/virtualenvs/dk-datasci-model-pipeline-fXF56sVU-py3.10/lib/python3.10/site-packages/mlflow/store/_unity_catalog/registry/rest_store.py", line 300, in search_registered_models
    response_proto = self._call_endpoint(SearchRegisteredModelsRequest, req_body)
  File "/home/rohan/.cache/pypoetry/virtualenvs/dk-datasci-model-pipeline-fXF56sVU-py3.10/lib/python3.10/site-packages/mlflow/store/model_registry/base_rest_store.py", line 43, in _call_endpoint
    self.get_host_creds(), endpoint, method, json_body, response_proto, extra_headers
  File "/home/rohan/.cache/pypoetry/virtualenvs/dk-datasci-model-pipeline-fXF56sVU-py3.10/lib/python3.10/site-packages/mlflow/utils/databricks_utils.py", line 439, in get_databricks_host_creds
    config = provider.get_config()
  File "/home/rohan/.cache/pypoetry/virtualenvs/dk-datasci-model-pipeline-fXF56sVU-py3.10/lib/python3.10/site-packages/databricks_cli/configure/provider.py", line 134, in get_config
    raise InvalidConfigurationError.for_profile(None)
databricks_cli.utils.InvalidConfigurationError: You haven't configured the CLI yet! Please configure by entering `/home/rohan/Documents/dk_datasci_model_pipeline/scripts/check_mlflow_connection.py configure`

Other info / logs

REPLACE_ME

What component(s) does this bug affect?

What interface(s) does this bug affect?

What language(s) does this bug affect?

What integration(s) does this bug affect?

BenWilson2 commented 10 months ago

@jerrylian-db @smurching could either of you take a look at this?

github-actions[bot] commented 10 months ago

@mlflow/mlflow-team Please assign a maintainer and start triaging this issue.