pulumi / pulumi-eks

A Pulumi component for easily creating and managing an Amazon EKS Cluster
https://www.pulumi.com/registry/packages/eks/
Apache License 2.0
171 stars 80 forks source link

[Feature Request] Ability to specify the AZ when creating NodeGroups #95

Open lbogdan opened 5 years ago

lbogdan commented 5 years ago

TL;DR Proposal

It would be nice to have an availabilityZone?: pulumi.Input<string> in ClusterNodeGroupOptions that does the same thing I did below:

Use case

Because ELB volumes can only be mounted by an instance in the same AZ, it makes sense to be able to place all nodes of an EKS cluster in a specific AZ when using ELB volumes, as you're expecting a pod should be able to mount an ELB volume regardless of the node it's scheduled on.

Current Behavior

When creating a cluster without specifying a VPC, the default VPC is used, together with the default subnets in that VPC, and we end up with nodes scattered throughout the whole region, placed in as many AZ as are default subnets.

Workaround

An ES2 instance is placed in the same AZ its subnet is placed in, so the way we can place a NodeGroup's nodes in a specific AZ is to set nodeSubnetIds in ClusterNodeGroupOptions to a subnet in that AZ. To be able to specify the literal AZ name (e.g. eu-central-1c), I've come up with a function that given an eks.Cluster and an AZ name, returns the subnet id placed in that AZ:

export function getSubnetIdInAZ(cluster: eks.Cluster, az: string): Output<string> {
  const { subnetIds } = cluster.eksCluster.vpcConfig;
  return subnetIds.apply(async ids => {
    const subnets = await Promise.all(ids.map(id => aws.ec2.getSubnet({ id })));
    const subnet = subnets.find(subnet => subnet.availabilityZone === az);
    if (!subnet) {
      throw new Error(`No subnet found in ${az} zone`);
    }
    return subnet.id;
  });
}

, which I then used like this:

const cluster = new eks.Cluster('cluster', {
  skipDefaultNodeGroup: true,
});
cluster.createNodeGroup('worker', {
  /* ... */
  nodeSubnetIds: [getSubnetIdInAZ(cluster, 'eu-central-1c')],
});
lbogdan commented 5 years ago

After talking to @lukehoban on Slack, my proposal is to add availabilityZone?: pulumi.Input<string>; to ClusterNodeGroupOptions (and, consequently, to eks.Cluster); if it is set, the node group instances will only be placed inside that AZ, by means of linking them to the subnet id placed in the AZ.

Questions:

@lukehoban @metral Please let me know what you think.

lukehoban commented 5 years ago

Looking at this a little more deeply - it's a little unfortunate that we would have to allow specifying both nodeSubnetIds and availabilityZone. These are largely overlapping concepts for this purpose.

It does feel like allowing specifying subnets ought to be sufficient here - and that is in general how most AWS APIs I'm familiar with are designed. I wonder if there's some other approach we can make to make the work needed to discover the subnets for a specific AZ easier?

For example - getting the subnet for an AZ feels like it should be simpler than the workaround above -just:

async function getSubnetForAz(vpcId: string, availabilityZone: string) {
    const subnet = await aws.ec2.getSubnet({ vpcId, availabilityZone });
    return subnet.id;
}

I think given that doing this lookup is actually reasonable "easy" - that it may not be necessary to also allow explicit availabilityZone options for the Cluster? The subnetIds are sufficiently (actually even more) expressive?

lbogdan commented 5 years ago

This proposal was for expressiveness and convenience, more than anything.

  1. How it works now:
cluster.createNodeGroup('worker', {
  nodeSubnetIds: ['subnet-538c761e'], // eu-central-1c subnet
  /* ... */
}
  1. How it could work:
cluster.createNodeGroup('worker', {
  availabilityZone: 'eu-central-1c',
  /* ... */
}

As I said, I'm not that familiar with the AWS infrastructure, so that might be the only reason I'd prefer 2 over 1.

ledor473 commented 5 years ago

Another reason to provide this ability is that sometimes AWS returns this error:

    error: Plan apply failed: error creating EKS Cluster (my-cluster-eksCluster-4e64260): UnsupportedAvailabilityZoneException: Cannot create cluster 'my-cluster-eksCluster-4e64260' because us-east-1e, the targeted availability zone, does not currently have sufficient capacity to support the cluster. Retry and choose from these availability zones: us-east-1a, us-east-1b, us-east-1c, us-east-1d, us-east-1f
        status code: 400, request id: <...>

As it's documented here, I feel it must be quite frequent: https://docs.aws.amazon.com/eks/latest/userguide/troubleshooting.html

eksctl eventually added support for that reason: https://github.com/weaveworks/eksctl/issues/118

It would be unfortunate that this wrapper automatically create the needed VPC and Subnet (if not provided with one), but does not handle such error.

r0fls commented 4 years ago

I'm facing the above error (UnsupportedAvailabilityZoneException). I've tried adding the availabilityZones argument to the VPC as such:

module.exports = function run(config, resources) {
  const vpc = new awsx.ec2.Vpc("vpc", { availabilityZones: [
    'us-east-1a',
    'us-east-1b',
    'us-east-1c',
  ]});
  const cluster = new eks.Cluster("k8s",
    {
      desiredCapacity: 2,
      maxSize: 2,
      vpc: vpc.id,
      subnetIds: vpc.subnetIds,
    }
  )
}

but still receive:

 Message_: "Cannot create cluster 'k8s-eksCluster-7d9724a' because us-east-1e, the targeted availability zone, does not currently have sufficient capacity to support t
he cluster. Retry and choose from these availability zones: us-east-1a, us-east-1b, us-east-1c, us-east-1d, us-east-1f",

Is there a workaround?

metral commented 4 years ago

Recently it seems like more folks are encountering the targeted availability zone does not currently have sufficient capacity to support the cluster error in us-east-1. It appears that AWS is struggling here.

The OP and recent comments seem to be regarding self-managed nodegroups in pulumi/eks - that is nodegroups that are backed by CloudFormation templates, as node groups were not a first-class resource in EKS until recently.

Though the ask is for the option to specify AZ's for self-managed nodegroups, managed nodegroups have been released to fill this management gap, and to let AWS handle it.

To use managed nodegroups, you'd first disable the defaultNodeGroup in the cluster (where this error is occurring for folks), and then create a new managed nodegroup. An example can be found here.

Given the release of managed nodegroups, are folks still interested in this capability for self-managed / cloudformation node groups?

andrewdibiasio6 commented 2 years ago

I had to use EKS with Fargate. Thankfully that should be okay for my use case.

pmantica3 commented 2 years ago

I encountered the targeted availability zone does not currently have sufficient capacity to support the cluster and do not know how to fix it in Pulumi. Has this been addressed by any chance?

omidraha commented 1 year ago

Default Python example code in the doc will raise the UnsupportedAvailabilityZoneException error.

Fixed for me with this sample python code by using the availability_zone_names option.

# Create a VPC for our cluster.
vpc = awsx.ec2.Vpc(
    "eks_vpc",
    availability_zone_names=[
        'us-west-2a',
        'us-west-2b',
        'us-west-2c'
    ],
)

# Cluster
# Create an EKS cluster with the default configuration.
cluster = eks.Cluster(
    "eks_cluster",
    skip_default_node_group=True,
    instance_roles=None,
    enabled_cluster_log_types=[
        "api",
        "audit",
        "authenticator",
    ],
    # Create an EKS cluster inside the VPC.
    vpc_id=vpc.vpc_id,
    public_subnet_ids=vpc.public_subnet_ids,
    private_subnet_ids=vpc.private_subnet_ids,
    node_associate_public_ip_address=False,
)