[BUG] MLflow is very slow with authentication enabled.

n3011 commented 1 year ago

Issues Policy acknowledgement

[X] I have read and agree to submit bug reports in accordance with the issues policy

Willingness to contribute

No. I cannot contribute a bug fix at this time.

MLflow version

Client: 2.7.1
Tracking server: 2.7.1

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
Python version: 3.10
yarn version, if running the dev UI:

Describe the problem

When enabling basic_auth , MLflow is very slow, 40X to 90X slower than without using authentication.

Here is a script to reproduce the runtime (assuming you have a mlflow instance running at port 5000)

import os
import time
os.environ["MLFLOW_TRACKING_USERNAME"]="admin"
os.environ["MLFLOW_TRACKING_PASSWORD"]="password"

import mlflow
from mlflow import MlflowClient
from mlflow.server import get_app_client

tracking_uri = "http://localhost:5000/"

client = MlflowClient(tracking_uri=tracking_uri)
t_start = time.time()
experiment_id = client.create_experiment(name="experiment")
print("Experiment creation took: ", time.time()-t_start)

for idx in range(3):
    t_start = time.time()
    tags = {"random": "random_tag", "mlflow.user": "test-user"}
    client.create_run(experiment_id, tags=tags, run_name="run_" + str(idx))
    print("Run creation took: ", time.time()-t_start)

t_start = time.time()
runs = client.search_runs(
    experiment_id,
    run_view_type=mlflow.entities.ViewType.ALL,
    max_results=mlflow.store.tracking.SEARCH_MAX_RESULTS_THRESHOLD,
)
print("Run retrieval took: ", time.time()-t_start)

Runtime, when starting MLflow server using mlflow server

Experiment creation took:  0.018462657928466797
Run creation took:  0.006806612014770508
Run creation took:  0.006391763687133789
Run creation took:  0.0066242218017578125
Run retrieval took:  0.005159139633178711

Runtime when starting MLflow server using mlflow server --app-name basic-auth

Run creation took:  0.4615945816040039
Run creation took:  0.46465229988098145
Run creation took:  0.45903968811035156
Run retrieval took:  0.45479750633239746

What component(s) does this bug affect?

[X] area/artifacts: Artifact stores and artifact logging
[ ] area/build: Build and test infrastructure for MLflow
[ ] area/docs: MLflow documentation pages
[ ] area/examples: Example code
[ ] area/gateway: AI Gateway service, Gateway client APIs, third-party Gateway integrations
[ ] area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
[ ] area/models: MLmodel format, model serialization/deserialization, flavors
[ ] area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
[ ] area/projects: MLproject format, project running backends
[ ] area/scoring: MLflow Model server, model deployment tools, Spark UDFs
[ ] area/server-infra: MLflow Tracking server backend
[X] area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

[ ] area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
[ ] area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
[ ] area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
[ ] area/windows: Windows support

What language(s) does this bug affect?

[ ] language/r: R APIs and clients
[ ] language/java: Java APIs and clients
[ ] language/new: Proposals for new client languages

What integration(s) does this bug affect?

[ ] integrations/azure: Azure and Azure ML integrations
[ ] integrations/sagemaker: SageMaker integrations
[ ] integrations/databricks: Databricks integrations

harupy commented 1 year ago

Pre and post-request processing for authentication might be causing overhead. I'll check.

github-actions[bot] commented 1 year ago

@mlflow/mlflow-team Please assign a maintainer and start triaging this issue.

lightnessofbein commented 1 year ago

Hey @n3011 @harupy ! I came across the issue and took some time to debug the server's request handling. It turns out that checking password hash is what takes the majority of the time for a request to be processed. Seems to be functioning as designed, rather than an unintended bug?

vcim commented 11 months ago

thanks for the response @lightnessofbein I face the same issue since basic http authentication enabled, mlflow client needs way more time to call e.g. mlflow.models.get_model_info() Does mlflow needs to authenticate everytime something is fetched from the database ?

marcinkosztolowicz commented 4 days ago

I have the same issue. Mlflow 2.14.1 and remote postgres database. I have debug that a bit and it looks like for every Server UI click on for example on experiment there is a couple of requests to tracking server (let say around 10). Every time hash function is executed but also every time session is created. And it looks like, creation of session takes more time. In my case When I tested that locally hash function took around 0.07 second and session open took around 0.16 second. When we multiply that by 10 it tooks 2.5 second only for authentication and auth requests.

mlflow / mlflow