zenml-io / zenml

ZenML πŸ™: The bridge between ML and Ops. https://zenml.io.
https://zenml.io
Apache License 2.0
3.97k stars 431 forks source link

Restrict AWS Sagemaker Instance Type Selection to Orchestrator Configuration #2214

Open strickvl opened 9 months ago

strickvl commented 9 months ago

Open Source Contributors Welcomed!

Please comment below if you would like to work on this issue!

Contact Details [Optional]

support@zenml.io

What happened?

The configuration of the instance_type for AWS Sagemaker Orchestrator is currently determined by the developer/data scientist/ML engineer at the time of running the pipeline via the SagemakerOrchestratorSettings in code. This setup does not allow a DevOps Engineer or ML Engineer with an admin role to control or restrict the choice of instance types. This could lead to potential misuse, such as selecting excessively high-resource instances for trivial tasks or intentionally creating resource-intensive loops.

Task Description

Move the instance_type attribute from the SagemakerOrchestratorSettings in the code to the SagemakerOrchestrator config, which is set up during the component registration. This change will allow better control and governance over the resources used for running pipelines in AWS Sagemaker.

Expected Outcome

Steps to Implement

Additional Context

This change is prompted by the need to enhance governance and control over resource utilization in cloud environments, particularly in team settings where multiple individuals have access to deploy pipelines.

Code of Conduct

christianversloot commented 9 months ago

Is there room for discussion regarding this idea?

We are currently benefiting greatly from the fact that it's allowed to provide instance_type via SagemakerOrchestratorSettings. Internally, we've defined a procedure where any AWS-based training run is discussed first, where at least four eyes give the approval that it can actually run in the cloud. Any other run is done with local resources. We're using a variety of instance types. In fact, if I am not incorrect, we will now need to define multiple stacks with multiple SageMaker Orchestrator components in order to use multiple instance types, which is quite cumbersome for us.

I do however understand the rationale for this issue very well especially for larger organizations, but could there be some middle ground? For example, something like this:

WDYT?

strickvl commented 9 months ago

I think I like the suggestion! It's a nice middle ground between flexibility + control over resource usage. It would be a new approach we haven't taken so far in how we allow components to be configured, and I'd be interested in @schustmi's thoughts on the approach particularly in the light of RBAC / permissions work he's been doing recently. I'm wondering if we should / should not consider this scenario with that in mind?

schustmi commented 9 months ago

I also like the suggestion, seems like a good compromise πŸ‘

RBAC will control which users have permissions to update the stack component configuration but will not affect the Settings right now, so nothing important to consider there.

AryaMoghaddam commented 6 months ago

I would like to work on this issue, if possible

AryaMoghaddam commented 6 months ago

@strickvl Sorry, I picked up the other 2 issues wouldn't have time for this one

aiakide commented 4 weeks ago

This sounds like a nice first issue to participate in the development of ZenML. I would like to participate and implement the changes suggested by @christianversloot. πŸ™‚

Also, am I correct in assuming that the change does not need to be made in the SagemakerStepOperatorConfig since the instance type configuration can be traced back to a step there anyway, or does the customization need to be made there as well?

schustmi commented 4 weeks ago

@aiakide I'm currently working on a PR #2984 that affects the settings of all stack components which are related to infrastructure resources (like the instance_type mentioned in this ticket). Is it okay if you wait a few more days until that is merged before starting to implement this suggestion?

aiakide commented 4 weeks ago

@schustmi Sure, no problem. No hurry.

schustmi commented 4 weeks ago

Thanks! I'll let you know once it's merged and also add some more details on how this should be implemented with the new settings structure