Open raxod502-plaid opened 3 months ago
@raxod502-plaid Thank you for the proposal! The support for private/authenticated repository makes a lot sense to me.
One question about the solution when/where we expect users to configure authentication. Per my understanding, tomake AWS Code Artifacts work as a proxy index that pip install
command can use, users need to either run aws codeartifact login
or pip config
to set up setting. Does it make sense to assume users already run it outside MLflow (so we only need to switch destination), or should we also handle it within dependency installation logic?
If it is the latter case and need some design choice, it would be a good idea to have a quick 2 pager with the OSS design template.
I think unfortunately it is necessary to allow users to run the setup command within MLflow, if we want to support use cases like AWS SageMaker, where (to my knowledge) you are only able to provide a Docker image and it is just run as is with the model configuration mounted in - you don't have any control over the rest of the environment, other than setting env-vars.
I'll write a document following that template.
@raxod502-plaid Makes sense, thank you for the clarification. Please let us know once the draft is ready, much appreciated.
Here's a design proposal: https://docs.google.com/document/d/1M47mxxkDO7tkol9hVoxd3SMeMnfYtlyOda2BGQXVxnE/edit
@mlflow/mlflow-team Please assign a maintainer and start triaging this issue.
Willingness to contribute
Yes. I can contribute this feature independently.
Proposal Summary
Currently, it is only possible to install Python dependencies from unauthenticated package registries, because there is no support for supplying ephemeral authentication credentials in the
requirements.txt
format supported by MLflow.Several containers provided by AWS support an
CA_REPOSITORY_ARN
environment variable which, if provided, automatically triggers the dependency installer to authenticate to the supplied CodeArtifact repository and set it as the index URL before installing dependencies. Adopting the same standard for MLflow would be one option. In this case, other authenticated repositories could be supported by differently named environment variables. This would allow for maximum ease of use, but explicit support would be needed for each repository that somebody wanted to use.An alternative implementation would be to allow for a user-provided hook to be run before package installation, for example a shell script at a specific position on the filesystem. Such a hook could perform whatever installation and Pip setup commands the user desires to be run. The advantage of this is it would be vendor-agnostic. On the other hand, the user would need to do more work regardless of which custom package registry they use.
Motivation
Installing dependencies of a model that are not available from a public package registry, or should be installed from an internal proxy that requires authentication.
Currently, MLflow is realistically only compatible with unauthenticated Python package registries (or ones that allow for long-lived authentication tokens), which impedes the adoption of improved security and authentication postures for supply-chain security.
We're moving our internal Python package hosting from an unauthenticated registry to AWS CodeArtifact, which does require authentication.
There is currently no way to provide credentials for MLflow to use while installing packages, other than hardcoding them into
requirements.txt
, which does not work since it is not possible to obtain long-lived credentials for AWS CodeArtifact (the maximum is 12 hours).Details
No response
What component(s) does this bug affect?
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/deployments
: MLflow Deployments client APIs, server, and third-party Deployments integrationsarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/recipes
: Recipes, Recipe APIs, Recipe configs, Recipe Templatesarea/projects
: MLproject format, project running backendsarea/scoring
: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra
: MLflow Tracking server backendarea/tracking
: Tracking Service, tracking client APIs, autologgingWhat interface(s) does this bug affect?
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportWhat language(s) does this bug affect?
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesWhat integration(s) does this bug affect?
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrations