Open franco-bocci opened 4 months ago
Thank you for proposing this feature, @franco-bocci ! Adding index sounds valuable for the large scale use case of the model registry. However, one concern is the migration effort as it is basically the schema change. Since many users are running the model registry in production, the migration needs to be done carefully with an easy and safe migration script/tool.
Thank you for proposing this feature, @franco-bocci ! Adding index sounds valuable for the large scale use case of the model registry. However, one concern is the migration effort as it is basically the schema change. Since many users are running the model registry in production, the migration needs to be done carefully with an easy and safe migration script/tool.
Hey! Yes, I think we could add this to the alembic migrations so that the new version includes it. What do you think?
Yes I think that works! Before actually start working on it, would you mind running a quick benchmark in your env with index/no-index, so we can get overall sense of the impact of indexing? It is non-trivial effort to update the schema, so we just want to make sure we can get certain performance improvement for common use cases🙂 If I can be more greedy, it would be super nice if the benchmark includes the small-medium database size, as the registry DB size is not so large for common use cases. You can also refer to the thread that we've added index to tracking database: https://github.com/mlflow/mlflow/issues/3785
Hey! Sure. From my side, this are the next steps: 1) add this index to our DB 2) compare CPU usage before and after adding the index 3) sharing the results here
After that, if it's okay for you, I can work on adding this index through Alembic. Probably will be done during this week. Works for you?
@mlflow/mlflow-team Please assign a maintainer and start triaging this issue.
@franco-bocci Yes that sounds like a plan! Please let us know when you get the benchmark (no time pressure!). Thank you so much for your willingness for contribution🙂
Hey! Apologies for the long delay. It was a simple change, but as DB migrations can go wrong, this went stale as I had to focus on other things.
We were having a CPU spike on the DB side every day at 2 a.m.from a query performing this SELECT filtering by model_version_tags.name
AND model_version_tags.version
. One client of the service was performing update for multiple I think models at that point in time (not sure exactly whether they were updating models).
The CPU usage went from > 90% to 30% after applying the index. No other changes done. For the index creation, we executed:
CREATE INDEX IF NOT EXISTS model_version_tags_name_and_version
ON model_version_tags (name, version);
Hope this helps as a reference.
Willingness to contribute
Yes. I can contribute this feature independently.
Proposal Summary
The table
model_version_tags
could benefit from an index for both thename
andversion
field. Filtering this table forname
andversion
increases CPU usage by a lot. Currently, the table has a PK forkey
,name
andversion
but not an index for those fields.https://github.com/mlflow/mlflow/blob/master/mlflow/store/model_registry/dbmodels/models.py#L144Motivation
Details
No response
What component(s) does this bug affect?
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/deployments
: MLflow Deployments client APIs, server, and third-party Deployments integrationsarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/recipes
: Recipes, Recipe APIs, Recipe configs, Recipe Templatesarea/projects
: MLproject format, project running backendsarea/scoring
: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra
: MLflow Tracking server backendarea/tracking
: Tracking Service, tracking client APIs, autologgingWhat interface(s) does this bug affect?
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportWhat language(s) does this bug affect?
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesWhat integration(s) does this bug affect?
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrations