iterative / terraform-provider-iterative

☁️ Terraform plugin for machine learning workloads: spot instance recovery & auto-termination | AWS, GCP, Azure, Kubernetes
https://registry.terraform.io/providers/iterative/iterative/latest/docs
Apache License 2.0
289 stars 27 forks source link

Support setting completions to a different value than parallelism for k8s tasks #659

Open sjawhar opened 2 years ago

sjawhar commented 2 years ago

Requested Functionality

When running a large parallel job on a k8s cluster, I might have many more tasks that need to be run than I have pods to run them. In that case, it would be very useful to use k8s indexed jobs. A key part of this is the ability to set parallelism and completions to different values—specifically, parallelism would have a lower value than completions. See the discussion in https://github.com/iterative/terraform-provider-iterative/pull/597 for background and context.

Example TF file:

resource "iterative_task" "example" {
  cloud     = "k8s"
  machine   = "1-1024"
  image     = "python:3.8.12"
  disk_size = 1

  parallelism = 2
  completions = 5
0x2b3bfa0 commented 2 years ago

Related to #585, useful but harder to implement in backends other than k8s

omesser commented 2 years ago

Related to #585, useful but harder to implement in backends other than k8s

@0x2b3bfa0 let's start with k8s support only then

sjawhar commented 2 years ago

You can see here how I implemented it. Should I open a PR, or would you prefer a different implementation?