spotinst / terraform-provider-spotinst

Terraform Spotinst provider.
https://registry.terraform.io/providers/spotinst/spotinst/latest/docs
Mozilla Public License 2.0
63 stars 56 forks source link

Cluster roll failure when 2 or more VNGs are updated at once #447

Open dmitrykruglov opened 1 year ago

dmitrykruglov commented 1 year ago

Description

Hello,

We have 2 VNGs (spotinst_ocean_aws_launch_spec) that have should_roll feature enabled (in order to automate cluster/VNG roll when configuration changes). When updating two VNGs at once in 1 terraform apply (for example AMI ID change), terraform fails with an error "Can't have 2 Rolls at the same time. Please stop the previous one". This is one of the reasons why we had to stop using VNGs for now and only use the default VNG to avoid this problem..

Terraform Version

1.3.9

Affected Resource(s)

spotinst_ocean_aws_launch_spec

Terraform Configuration Files

module "ocean-aws-k8s-vng_stateless" {
   source = "spotinst/ocean-aws-k8s-vng/spotinst"

   name = "stateless-group" # Name of VNG in Ocean
   ocean_id = local.ocean_id

   image_id = "ami-07bccaac087171156"
   labels = [{key="type",value="stateless"}]
   spot_percentage = 100 # Change the spot %

   should_roll = true
 }

 ## Create additional Ocean Virtual Node Group (launchspec) ##
 module "ocean-aws-k8s-vng_stateful" {
   source = "spotinst/ocean-aws-k8s-vng/spotinst"

   name = "stateful-group"  # Name of VNG in Ocean
   ocean_id = local.ocean_id

   image_id = "ami-07bccaac087171156"
   labels = [{key="type",value="stateful"}]
   taints = [{key="type",value="stateful",effect="NoSchedule"}]
   spot_percentage = 0
   #instance_types = ["g4dn.xlarge","g4dn.2xlarge"] # Limit VNG to specific instance types

   should_roll = true
 }

Debug Output

deployment/191/default/spotio": exit status 1
Dynamic environment variables added:
_PASS

module.ocean-aws-k8s-vng_stateless.spotinst_ocean_aws_launch_spec.nodegroup: Modifying... [id=ols-*******1]
module.ocean-aws-k8s-vng_stateful.spotinst_ocean_aws_launch_spec.nodegroup: Modifying... [id=ols-*******2]
module.ocean-aws-k8s-vng_stateful.spotinst_ocean_aws_launch_spec.nodegroup: Modifications complete after 1s [id=ols-*******2]
╷
│ Error: onRoll() -> Roll failed for cluster [ols-*******1], error: POST https://api.spotinst.io/ocean/aws/k8s/cluster/ols-*******1/roll?accountId=act-******: 400 (request: "32217267-9bdb-463a-ad6b-fc1440a6018a") CLUSTER_ROLL_ALREADY_IN_PROGRESS: Can't have 2 Rolls at the same time. Please stop the previous one.
│ 
│ 
│   with module.ocean-aws-k8s-vng_stateless.spotinst_ocean_aws_launch_spec.nodegroup,
│   on .terraform/modules/ocean-aws-k8s-vng_stateless/main.tf line 2, in resource "spotinst_ocean_aws_launch_spec" "nodegroup":
│    2: resource "spotinst_ocean_aws_launch_spec" "nodegroup" {
│ 

Expected Behavior

Terraform shouldn't crash with an error. Cluster roll either needs to complete just once, applying changes to both VNGs, or VNGs need to roll independently at the same time.

Actual Behavior

Terraform crashes with the error "Can't have 2 Rolls at the same time" and fails to roll/apply changes to one of the VNGs.

Steps to Reproduce

ilijad1 commented 1 year ago

@dmitrykruglov I got the same issue when trying to upgrade multiple VNGs at once, and i believe it needs to be fixed or well documented in the provider Terraform docs.

If you want to rollout more than one VNG at the same time, you should do that from the Ocean cluster level (example below):

resource "spotinst_ocean_aws" "ocean_cluster" {
  count                = ..........
  name                 = ..........
  controller_id        = ..........
  region               = ..........
  image_id             = ..........
  iam_instance_profile = ..........
  desired_capacity = ..........
  min_size         = ..........
  max_size         = ..........
  security_groups = []
  subnet_ids           = ..........
  key_name             = ..........

  update_policy {
    should_roll      = true 
    conditioned_roll = true|false
    auto_apply_tags  = true

    roll_config {
      batch_size_percentage        = 33
      launch_spec_ids              = ["ols-a0b****1", "ols-a0b****1"]
      batch_min_healthy_percentage = 20
      respect_pdb                  = true
    }
  }

  autoscaler {}
}

I managed to test this and it works perfectly fine for a list of VNGs.

The ocean_cluster documentation has the details for the configuration: https://registry.terraform.io/providers/spotinst/spotinst/latest/docs/resources/ocean_aws#update-policy

sharadkesarwani commented 7 months ago

Hi @dmitrykruglov The error you encountered while updating 2 vngs is intended. In order to update 2 or more vngs you can configure "update_policy" in cluster config and can pass list of vng_ids as shown in snippet below.

update_policy { should_roll = true roll_config { batch_size_percentage = 33 launch_spec_ids = ["ols-a0b1", "ols-a0b1"] batch_min_healthy_percentage = 20 respect_pdb = true } }