[FR] Spark Model Cache Replacement Policy

JohnFirth commented 2 years ago

Willingness to contribute

Yes. I would be willing to contribute this feature with guidance from the MLflow community.

Proposal Summary

I'd like the ability to set a cache replacement policy for SparkModelCache, which currently has no policy. https://github.com/mlflow/mlflow/blob/9b83b355fc9c64ad1b51c66b1187eaab40d40d61/mlflow/pyfunc/spark_model_cache.py#L15

Motivation

What is the use case for this feature?

Performing batch inference with multiple models whose combined size would exhaust memory if loading them at the same time were attempted.

Why is this use case valuable to support for MLflow users in general?

Others may wish to perform such an operation. I'm not sure how common the need is.

Why is this use case valuable to support for your project(s) or organization?

I'm currently performing batch inference with hundreds of models per Spark cluster, whose individual size can be up to 1GB.

Why is it currently difficult to achieve this use case?

The spark model cache has no replacement policy so attempting the above use case could cause an OOM. https://github.com/mlflow/mlflow/blob/9b83b355fc9c64ad1b51c66b1187eaab40d40d61/mlflow/pyfunc/spark_model_cache.py#L15

Details

Perhaps this could be configured with an environment variable, but I'm not too sure. Happy to try to supply this feature with some guidance :)

What component(s) does this bug affect?

[ ] area/artifacts: Artifact stores and artifact logging
[ ] area/build: Build and test infrastructure for MLflow
[ ] area/docs: MLflow documentation pages
[ ] area/examples: Example code
[ ] area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
[ ] area/models: MLmodel format, model serialization/deserialization, flavors
[ ] area/pipelines: Pipelines, Pipeline APIs, Pipeline configs, Pipeline Templates
[ ] area/projects: MLproject format, project running backends
[ ] area/scoring: MLflow Model server, model deployment tools, Spark UDFs
[ ] area/server-infra: MLflow Tracking server backend
[ ] area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

[ ] area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
[ ] area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
[ ] area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
[ ] area/windows: Windows support

What language(s) does this bug affect?

[ ] language/r: R APIs and clients
[ ] language/java: Java APIs and clients
[ ] language/new: Proposals for new client languages

What integration(s) does this bug affect?

[ ] integrations/azure: Azure and Azure ML integrations
[ ] integrations/sagemaker: SageMaker integrations
[ ] integrations/databricks: Databricks integrations

WeichenXu123 commented 2 years ago

Do you mean policy such as LRU ?

WeichenXu123 commented 2 years ago

One issue is, how to check how much memory the model uses ? And once exceeding memory threshold, we can evict the model from cache.

JohnFirth commented 2 years ago

Hey @WeichenXu123

Do you mean policy such as LRU ?

Yeah, I think LRU would be suitable at least for my use case of multiple models, each being used one after the other.

One issue is, how to check how much memory the model uses ? And once exceeding memory threshold, we can evict the model from cache.

I think a simple upper limit on the number of models would be adequate, at least for me. (For my use case in fact, the limit could be 1.)

mlflow-automation commented 2 years ago

@BenWilson2 @dbczumar @harupy @WeichenXu123 Please assign a maintainer and start triaging this issue.

dbczumar commented 2 years ago

Hey @WeichenXu123

Do you mean policy such as LRU ?

Yeah, I think LRU would be suitable at least for my use case of multiple models, each being used one after the other.

One issue is, how to check how much memory the model uses ? And once exceeding memory threshold, we can evict the model from cache.

I think a simple upper limit on the number of models would be adequate, at least for me. (For my use case in fact, the limit could be 1.)

Hi @JohnFirth, apologies for the delay here. I think a configurable LRU cache would be great here, and we would be very excited about reviewing a PR with this feature, if you're still interested in contributing one. Please let me know if you have any questions.

JohnFirth commented 2 years ago

No worries @dbczumar :)

Happy to help, but I'm not quite sure how to set the cache size limit, tbh.

Perhaps SparkModelCache.get_or_load could receive a max_cache_size argument from spark_udf, which get_or_load then uses to enforce the limit (?)

WeichenXu123 commented 2 years ago

What about reading max_cache_size from environment variable ? You can define it in module mlflow/environment_variables.py

JohnFirth commented 2 years ago

@WeichenXu123 yeah, ok — I'll see what I can do :)

mlflow-automation commented 2 years ago

@WeichenXu123 Please reply to comments.

mlflow / mlflow