Open strickvl opened 9 months ago
Is there room for discussion regarding this idea?
We are currently benefiting greatly from the fact that it's allowed to provide instance_type
via SagemakerOrchestratorSettings
. Internally, we've defined a procedure where any AWS-based training run is discussed first, where at least four eyes give the approval that it can actually run in the cloud. Any other run is done with local resources. We're using a variety of instance types. In fact, if I am not incorrect, we will now need to define multiple stacks with multiple SageMaker Orchestrator components in order to use multiple instance types, which is quite cumbersome for us.
I do however understand the rationale for this issue very well especially for larger organizations, but could there be some middle ground? For example, something like this:
SagemakerOrchestrator
config to define a default instance_type
.other_instance_type_allowed: bool
or similarly named option to the same configuration, which allows DevOps engineers / admins to decide whether people running ZenML pipelines can manually provide instance_type
in SagemakerOrchestratorSettings
.instance_type
if none is provided; use the pipeline-specified one if another one is provided.instance_type
(i.e. the case where SagemakerOrchestratorSettings.instance_type != SagemakerOrchestratorConfig.instance_type
).WDYT?
I think I like the suggestion! It's a nice middle ground between flexibility + control over resource usage. It would be a new approach we haven't taken so far in how we allow components to be configured, and I'd be interested in @schustmi's thoughts on the approach particularly in the light of RBAC / permissions work he's been doing recently. I'm wondering if we should / should not consider this scenario with that in mind?
I also like the suggestion, seems like a good compromise π
RBAC will control which users have permissions to update the stack component configuration but will not affect the Settings
right now, so nothing important to consider there.
I would like to work on this issue, if possible
@strickvl Sorry, I picked up the other 2 issues wouldn't have time for this one
This sounds like a nice first issue to participate in the development of ZenML. I would like to participate and implement the changes suggested by @christianversloot. π
Also, am I correct in assuming that the change does not need to be made in the SagemakerStepOperatorConfig
since the instance type configuration can be traced back to a step there anyway, or does the customization need to be made there as well?
@aiakide I'm currently working on a PR #2984 that affects the settings of all stack components which are related to infrastructure resources (like the instance_type
mentioned in this ticket). Is it okay if you wait a few more days until that is merged before starting to implement this suggestion?
@schustmi Sure, no problem. No hurry.
Thanks! I'll let you know once it's merged and also add some more details on how this should be implemented with the new settings structure
Open Source Contributors Welcomed!
Please comment below if you would like to work on this issue!
Contact Details [Optional]
support@zenml.io
What happened?
The configuration of the instance_type for AWS Sagemaker Orchestrator is currently determined by the developer/data scientist/ML engineer at the time of running the pipeline via the
SagemakerOrchestratorSettings
in code. This setup does not allow a DevOps Engineer or ML Engineer with an admin role to control or restrict the choice of instance types. This could lead to potential misuse, such as selecting excessively high-resource instances for trivial tasks or intentionally creating resource-intensive loops.Task Description
Move the instance_type attribute from the
SagemakerOrchestratorSettings
in the code to theSagemakerOrchestrator
config, which is set up during the component registration. This change will allow better control and governance over the resources used for running pipelines in AWS Sagemaker.Expected Outcome
instance_type
should be configurable at the component registration level by an admin or a DevOps engineer.Steps to Implement
SagemakerOrchestrator
configuration to include theinstance_type
attribute.instance_type
option from theSagemakerOrchestratorSettings
.instance_type
set during the component registration and does not allow overrides at runtime.Additional Context
This change is prompted by the need to enhance governance and control over resource utilization in cloud environments, particularly in team settings where multiple individuals have access to deploy pipelines.
Code of Conduct