Eks cluster with instance_profile specified throws unknown instance_profile error

MitchellGerdisch commented 1 year ago

What happened?

Deploying the example code provided below throws the error below about not able to find an instance_profile. At first glance it seems like it is talking about the instance profile specified in the instance_profile_name property for the cluster, but it is actually about an instance profile that the eks package automatically creates in certain circumstances for the default node group.

The error does not occur if the following conditions are met:

No explicit instance_profile_name property is set for the cluster, or
the instance_profile_name property IS set along with skip_default_node_group=True property being set for the cluster.

----- ERROR (truncated a bit) -----

pulumi:pulumi:Stack (example-managed-nodegroups-py-dev):
    error: Program failed with an unhandled exception:
    Traceback (most recent call last):
      File "/Users/mitch/.pyenv/versions/3.9.6/lib/python3.9/site-packages/pulumi/runtime/resource.py", line 261, in do_invoke
        return monitor.Invoke(req)
      File "/Users/mitch/.pyenv/versions/3.9.6/lib/python3.9/site-packages/grpc/_channel.py", line 946, in __call__
        return _end_unary_response_blocking(state, call, False, None)
      File "/Users/mitch/.pyenv/versions/3.9.6/lib/python3.9/site-packag
.....
During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "/Users/mitch/.pulumi/bin/pulumi-language-python-exec", line 197, in <module>
        loop.run_until_complete(coro)
.....
Exception: invocation of pulumi:pulumi:getResource returned an error: unknown resource urn:pulumi:dev::example-managed-nodegroups-py::eks:index:Cluster$aws:iam/instanceProfile:InstanceProfile::example-managed-nodegroups-instanceProfile

---- EXAMPLE CODE ----

import pulumi_aws as aws
import pulumi_eks as eks
import json

managed_policy_arns = [
    "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy",
    "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy",
    "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly",
]

# Creates a role and attaches the EKS worker node IAM managed policies
def create_role(name: str) -> aws.iam.Role:
    role = aws.iam.Role(name, assume_role_policy=json.dumps({
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "AllowAssumeRole",
                "Effect": "Allow",
                "Principal": {
                    "Service": "ec2.amazonaws.com",
                },
                "Action": "sts:AssumeRole"
            }
        ]
    }))

    for i, policy in enumerate(managed_policy_arns):
        # Create RolePolicyAttachment without returning it.
        rpa = aws.iam.RolePolicyAttachment(f"{name}-policy-{i}",
            policy_arn=policy,
            role=role.id)

    return role
### create_role ###

# IAM role and instance profile
role0 = create_role("example-role0")
instance_profile = aws.iam.InstanceProfile(
    resource_name=f'cluster-instance-profile',
    role=role0
)

# Test 1: Uncomment the instance_profile_name property and pulumi up
#   Result: Error on preview about missing a different instance profile - this is a profile that the code generates.
#   I think I've also seen a case where the up was successful intially and then fail on preview for subsequent ups.
#   So try a couple of times if not reproducing.
# Test 2: Uncomment the skip_default_node_group property and pulumi up
#   Result: success
# Test 3: Comment out instance_profile_name and skip_default_node_group properties and pulumi up
#   Result: success

cluster = eks.Cluster("example-managed-nodegroups",
                    instance_profile_name=instance_profile.id,
                    # skip_default_node_group=True,
)

Expected Behavior

Either this is a bug in how the eks code handles the instance profile for the default node group when the condition where an instance profile name is specified, or it's an undocumented and unhandled interaction issue between the instance_profile_name property and the skip_default_node_group property or similar.

If it's the former, then the package should allow a setting where the instance_profile_name is set and the cluster has a default node group created. If it's the latter, the error should produce a more helpful message explaining what to do.

Steps to reproduce

The steps are captured as comments in the provided code above the cluster declaration.

Output of `pulumi about`

CLI
Version 3.76.1 Go Version go1.20.6 Go Compiler gc

Plugins NAME VERSION aws 5.8.0 azure-native 1.68.2 azuread 5.37.0 command 0.4.1 eks 1.0.2 gcp 6.50.0 k8s-servicedeployment 0.0.6 kubernetes 3.23.1 pulumiservice 0.6.1 python unknown random 4.8.2 snowflake 0.14.0 str 1.0.0 terraform 5.6.4 tls 4.10.0 vault 5.13.0

Host
OS darwin Version 13.5 Arch x86_64

This project is written in python

Found no pending operations associated with dev

Backend
Name pulumi.com

Additional context

No response

Contributing

Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

danielrbradley commented 1 year ago

I'm struggling to reproduce test 1 locally - uncommenting only the instance_profile_name but leaving skip_default_node_group commented. This previews fine for me.

To confirm, you were seeing that the initial preview would error if only instance_profile_name was set with skip_default_node_group not set?

I do see some differences in package versions. Please could you try updating AWS and kubernetes and see if that resolves the issue?

Here's the versions I was using which works:

aws         5.21.1
eks         1.0.2
kubernetes  3.30.2

MitchellGerdisch commented 1 year ago

So, I made a bit of a mistake in my test description. In Test 1 don't just preview but actually deploy. (The preview phenomenon in Test 1 occurs if you first run Test 3.)

Also, my pulumi about output above was wrong (I think I wasn't in my venv at the time). I'm testing with: Plugins NAME VERSION aws 5.42.0 awsx 1.0.0 docker 3.6.1 eks 1.0.2 kubernetes 3.30.2 python unknown

mikhailshilkov commented 1 year ago

I can repro the issue with the original example... It does sounds like a bug that we should fix.

pulumi / pulumi-eks