Open lbogdan opened 5 years ago
After talking to @lukehoban on Slack, my proposal is to add availabilityZone?: pulumi.Input<string>;
to ClusterNodeGroupOptions
(and, consequently, to eks.Cluster
); if it is set, the node group instances will only be placed inside that AZ, by means of linking them to the subnet id placed in the AZ.
Questions:
availabilityZoneId?: pulumi.Input<string>;
, to be consistent with SubnetArgs?availabilityZone
(/ availabilityZoneId
) and nodeSubnetsIds
are specified? (I'd say error with "you can only specify one of X and Y")@lukehoban @metral Please let me know what you think.
Looking at this a little more deeply - it's a little unfortunate that we would have to allow specifying both nodeSubnetIds
and availabilityZone
. These are largely overlapping concepts for this purpose.
It does feel like allowing specifying subnets ought to be sufficient here - and that is in general how most AWS APIs I'm familiar with are designed. I wonder if there's some other approach we can make to make the work needed to discover the subnets for a specific AZ easier?
For example - getting the subnet for an AZ feels like it should be simpler than the workaround above -just:
async function getSubnetForAz(vpcId: string, availabilityZone: string) {
const subnet = await aws.ec2.getSubnet({ vpcId, availabilityZone });
return subnet.id;
}
I think given that doing this lookup is actually reasonable "easy" - that it may not be necessary to also allow explicit availabilityZone
options for the Cluster
? The subnetIds
are sufficiently (actually even more) expressive?
This proposal was for expressiveness and convenience, more than anything.
cluster.createNodeGroup('worker', {
nodeSubnetIds: ['subnet-538c761e'], // eu-central-1c subnet
/* ... */
}
cluster.createNodeGroup('worker', {
availabilityZone: 'eu-central-1c',
/* ... */
}
As I said, I'm not that familiar with the AWS infrastructure, so that might be the only reason I'd prefer 2
over 1
.
Another reason to provide this ability is that sometimes AWS returns this error:
error: Plan apply failed: error creating EKS Cluster (my-cluster-eksCluster-4e64260): UnsupportedAvailabilityZoneException: Cannot create cluster 'my-cluster-eksCluster-4e64260' because us-east-1e, the targeted availability zone, does not currently have sufficient capacity to support the cluster. Retry and choose from these availability zones: us-east-1a, us-east-1b, us-east-1c, us-east-1d, us-east-1f
status code: 400, request id: <...>
As it's documented here, I feel it must be quite frequent: https://docs.aws.amazon.com/eks/latest/userguide/troubleshooting.html
eksctl
eventually added support for that reason: https://github.com/weaveworks/eksctl/issues/118
It would be unfortunate that this wrapper automatically create the needed VPC and Subnet (if not provided with one), but does not handle such error.
I'm facing the above error (UnsupportedAvailabilityZoneException). I've tried adding the availabilityZones
argument to the VPC as such:
module.exports = function run(config, resources) {
const vpc = new awsx.ec2.Vpc("vpc", { availabilityZones: [
'us-east-1a',
'us-east-1b',
'us-east-1c',
]});
const cluster = new eks.Cluster("k8s",
{
desiredCapacity: 2,
maxSize: 2,
vpc: vpc.id,
subnetIds: vpc.subnetIds,
}
)
}
but still receive:
Message_: "Cannot create cluster 'k8s-eksCluster-7d9724a' because us-east-1e, the targeted availability zone, does not currently have sufficient capacity to support t
he cluster. Retry and choose from these availability zones: us-east-1a, us-east-1b, us-east-1c, us-east-1d, us-east-1f",
Is there a workaround?
Recently it seems like more folks are encountering the targeted availability zone does not currently have sufficient capacity to support the cluster
error in us-east-1
. It appears that AWS is struggling here.
The OP and recent comments seem to be regarding self-managed nodegroups in pulumi/eks
- that is nodegroups that are backed by CloudFormation templates, as node groups were not a first-class resource in EKS until recently.
Though the ask is for the option to specify AZ's for self-managed nodegroups, managed nodegroups have been released to fill this management gap, and to let AWS handle it.
To use managed nodegroups, you'd first disable the defaultNodeGroup in the cluster (where this error is occurring for folks), and then create a new managed nodegroup. An example can be found here.
Given the release of managed nodegroups, are folks still interested in this capability for self-managed / cloudformation node groups?
I had to use EKS with Fargate. Thankfully that should be okay for my use case.
I encountered the targeted availability zone does not currently have sufficient capacity to support the cluster
and do not know how to fix it in Pulumi. Has this been addressed by any chance?
Default Python example code in the doc will raise the UnsupportedAvailabilityZoneException
error.
Fixed for me with this sample python code by using the availability_zone_names
option.
# Create a VPC for our cluster.
vpc = awsx.ec2.Vpc(
"eks_vpc",
availability_zone_names=[
'us-west-2a',
'us-west-2b',
'us-west-2c'
],
)
# Cluster
# Create an EKS cluster with the default configuration.
cluster = eks.Cluster(
"eks_cluster",
skip_default_node_group=True,
instance_roles=None,
enabled_cluster_log_types=[
"api",
"audit",
"authenticator",
],
# Create an EKS cluster inside the VPC.
vpc_id=vpc.vpc_id,
public_subnet_ids=vpc.public_subnet_ids,
private_subnet_ids=vpc.private_subnet_ids,
node_associate_public_ip_address=False,
)
TL;DR Proposal
It would be nice to have an
availabilityZone?: pulumi.Input<string>
inClusterNodeGroupOptions
that does the same thing I did below:Use case
Because ELB volumes can only be mounted by an instance in the same AZ, it makes sense to be able to place all nodes of an EKS cluster in a specific AZ when using ELB volumes, as you're expecting a pod should be able to mount an ELB volume regardless of the node it's scheduled on.
Current Behavior
When creating a cluster without specifying a VPC, the default VPC is used, together with the default subnets in that VPC, and we end up with nodes scattered throughout the whole region, placed in as many AZ as are default subnets.
Workaround
An ES2 instance is placed in the same AZ its subnet is placed in, so the way we can place a
NodeGroup
's nodes in a specific AZ is to setnodeSubnetIds
inClusterNodeGroupOptions
to a subnet in that AZ. To be able to specify the literal AZ name (e.g.eu-central-1c
), I've come up with a function that given aneks.Cluster
and an AZ name, returns the subnet id placed in that AZ:, which I then used like this: