panic: runtime error: invalid memory address or nil pointer dereference

abnud11 commented 1 month ago

What happened?

I'm trying to deploy a stack using aws-native, as a result of pulumi up --yes I get this:

panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x214ce5d] goroutine 42 [running]: github.com/pulumi/pulumi-aws-native/provider/pkg/resources.CalcPatch(0x10000004fca9d?, 0x72f9d6a6f518?, {{0xc000b90de0, 0x17}, 0xc000847da0, 0xc000847dd0, 0xc000847e00, {0xc000b51740, 0x1, 0x1}, ...}, ...) /Users/runner/work/pulumi-aws-native/pulumi-aws-native/provider/pkg/resources/patching.go:28 +0x37d github.com/pulumi/pulumi-aws-native/provider/pkg/provider.(cfnProvider).Update(0xc00147db00, {0x3585b60, 0xc002db8ed0}, 0xc002da02c0) /Users/runner/work/pulumi-aws-native/pulumi-aws-native/provider/pkg/provider/provider.go:1073 +0x62f github.com/pulumi/pulumi/sdk/v3/proto/go._ResourceProvider_Update_Handler.func1({0x3585b60, 0xc002db8ed0}, {0x2d3a060?, 0xc002da02c0}) /Users/runner/go/pkg/mod/github.com/pulumi/pulumi/sdk/v3@v3.134.1/proto/go/provider_grpc.pb.go:699 +0x75 github.com/grpc-ecosystem/grpc-opentracing/go/otgrpc.OpenTracingServerInterceptor.func1({0x3585b60, 0xc002db8840}, {0x2d3a060, 0xc002da02c0}, 0xc002d85020, 0xc002dad068) /Users/runner/go/pkg/mod/github.com/grpc-ecosystem/grpc-opentracing@v0.0.0-20180507213350-8e809c8a8645/go/otgrpc/server.go:57 +0x3d0 github.com/pulumi/pulumi/sdk/v3/proto/go._ResourceProvider_Update_Handler({0x2dc15e0?, 0xc00147db00}, {0x3585b60, 0xc002db8840}, 0xc000152b00, 0xc0006376a0) /Users/runner/go/pkg/mod/github.com/pulumi/pulumi/sdk/v3@v3.134.1/proto/go/provider_grpc.pb.go:701 +0x135 google.golang.org/grpc.(Server).processUnaryRPC(0xc0005d9200, {0x3585b60, 0xc002db87b0}, {0x3593a00, 0xc000a9d200}, 0xc002db3320, 0xc001f815f0, 0x4b0ab28, 0x0) /Users/runner/go/pkg/mod/google.golang.org/grpc@v1.63.2/server.go:1369 +0xe23 google.golang.org/grpc.(Server).handleStream(0xc0005d9200, {0x3593a00, 0xc000a9d200}, 0xc002db3320) /Users/runner/go/pkg/mod/google.golang.org/grpc@v1.63.2/server.go:1780 +0x1016 google.golang.org/grpc.(Server).serveStreams.func2.1() /Users/runner/go/pkg/mod/google.golang.org/grpc@v1.63.2/server.go:1019 +0x8b created by google.golang.org/grpc.(*Server).serveStreams.func2 in goroutine 62 /Users/runner/go/pkg/mod/google.golang.org/grpc@v1.63.2/server.go:1030 +0x135

I'm using pulumi 3.134.1 but I tried downgrading pulumi CLI to 3.134.0 and to 3.133.0 without luck(it seems pulumi 3.134.1 is still used in the above logs).

I don't see any errors but the above and I can't pinpoint the resource responsible for that.

Example

Not sure what resource caused the error.

Output of `pulumi about`

CLI
Version 3.133.0 Go Version go1.23.1 Go Compiler gc

Plugins KIND NAME VERSION resource aws 6.54.1 resource aws-native 1.0.1 resource aws-native 1.0.0 resource awsx 2.16.0 resource cloudflare 5.39.1 resource docker 4.5.5 resource docker 3.6.1 resource ec 0.10.1 resource mongodbatlas 3.19.1 language nodejs unknown

Host
OS arch Version
Arch x86_64

This project is written in nodejs: executable='/home/abd/.nvm/versions/node/v20.17.0/bin/node' version='v20.17.0'

Additional context

I'm using Arch Linux x64, and the latest version of aws-native

Contributing

Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

flostadler commented 1 month ago

Hey @abnud11, sorry you're running into this. Can you please provide a repro so we can further analyze this? Thanks!

abnud11 commented 1 month ago

Hey @flostadler, the problem is I can't know exactly which resource is causing this, the code I'm using is this:

const bucket = new awsNative.s3.Bucket('MyBucket', {
  bucketName: 'backend-bucket',
});
const vpc = new awsx.ec2.Vpc('MyVPC', {
  natGateways: {
    strategy: awsx.ec2.NatGatewayStrategy.Single,
  },
  numberOfAvailabilityZones: 1,
});
const nestjsSecurityGroup = new awsNative.ec2.SecurityGroup(
  'MyNestjsSecurityGroup',
  {
    vpcId: vpc.vpcId,
    securityGroupEgress: [],
    securityGroupIngress: [
      {
        cidrIp: '0.0.0.0/0',
        ipProtocol: 'tcp',
        fromPort: 80,
        toPort: 80,
      },
    ],
    groupDescription: 'group for protecting nestjs backend',
  },
);
const nestjsLoadBalancerSecurityGroup = new awsNative.ec2.SecurityGroup(
  'MyNestLoadBalancerSecurityGroup',
  {
    vpcId: vpc.vpcId,
    securityGroupEgress: [
      {
        fromPort: 80,
        toPort: 80,
        ipProtocol: 'tcp',
        destinationSecurityGroupId: nestjsSecurityGroup.id,
      },
    ],
    securityGroupIngress: [
      {
        fromPort: 80,
        toPort: 80,
        ipProtocol: 'tcp',
        cidrIp: '0.0.0.0/0',
      },
    ],
    groupDescription: 'security group for nestjs load balancer',
  },
);
new awsNative.ec2.SecurityGroupIngress(
  'MyFromNestjsLoadBalancerToNestjs',
  {
    groupId: nestjsSecurityGroup.groupId,
    sourceSecurityGroupId: nestjsLoadBalancerSecurityGroup.id,
    fromPort: 80,
    toPort: 80,
    ipProtocol: 'tcp',
  },
);
const redisClusterSecurityGroup = new awsNative.ec2.SecurityGroup(
  'MyRedisClusterSecurityGroup',
  {
    vpcId: vpc.vpcId,
    securityGroupEgress: [],
    securityGroupIngress: [
      {
        fromPort: 6379,
        toPort: 6379,
        ipProtocol: 'tcp',
        sourceSecurityGroupId: nestjsSecurityGroup.id,
      },
    ],
    groupDescription: 'group for protecting redis cluster',
  },
);
const subnetGroup = new awsNative.elasticache.SubnetGroup(
  'MyRedisSubnetGroup',
  {
    subnetIds: privateSubnets,
    description: 'subnets to use to connect to redis',
    cacheSubnetGroupName: 'wowvir-redis-subnet-group',
  },
);
const redis = new aws.elasticache.Cluster('MyRedis', {
  engine: 'redis',
  engineVersion: '7.1',
  nodeType: 'cache.t4g.micro',
  numCacheNodes: 1,
  securityGroupIds: [redisClusterSecurityGroup.id],
  subnetGroupName: subnetGroup.cacheSubnetGroupName,
});
const nestjsCluster = new awsNative.ecs.Cluster('MyCluster', {
  clusterName: 'wowvir-nestjs-cluster',
});
const currentAccount = await awsNative.getAccountId();
const nestjsServiceRole = new awsNative.iam.Role('MyServiceRole', {
  assumeRolePolicyDocument: {
    Version: '2012-10-17',
    Statement: [
      {
        Action: 'sts:AssumeRole',
        Principal: {
          Service: 'ecs-tasks.amazonaws.com',
        },
        Effect: 'Allow',
        Condition: {
          StringEquals: {
            'aws:SourceAccount': currentAccount.accountId,
          },
        },
      },
    ],
  },
});
const ecsInstanceRole = new awsNative.iam.Role('MyEC2InstanceRole', {
  assumeRolePolicyDocument: {
    Version: '2012-10-17',
    Statement: [
      {
        Action: 'sts:AssumeRole',
        Principal: {
          Service: 'ec2.amazonaws.com',
        },
        Effect: 'Allow',
        Condition: {
          StringEquals: {
            'aws:SourceAccount': currentAccount.accountId,
          },
        },
      },
    ],
  },
  managedPolicyArns: [
    aws.iam.ManagedPolicy.AmazonEC2ContainerServiceforEC2Role,
    aws.iam.ManagedPolicy.AmazonSSMManagedInstanceCore,
  ],
});
const ami = await aws.ec2.getAmi({
  filters: [
    {
      name: 'name',
      values: ['al2023-ami-ecs-hvm-2023.0.20240820-kernel-6.1-arm64'],
    },
  ],
});
const ec2InstanceProfile = new awsNative.iam.InstanceProfile(
  'MyEC2InstanceProfile',
  {
    roles: [ecsInstanceRole.roleName],
  },
);
new awsNative.ec2.Instance('MyInstance', {
  instanceType: 't4g.micro',
  imageId: ami.id,
  subnetId: privateSubnets.apply((subnets) => subnets[0]),
  iamInstanceProfile: ec2InstanceProfile.arn,
  userData: pulumi.interpolate`#!/bin/bash
    echo ECS_CLUSTER=${nestjsCluster.clusterName} >> /etc/ecs/ecs.config
  `,
  tags: [{ key: 'Name', value: 'backend server' }],
});

I think the EC2 Instance is not responsible because it shows a different error about grpc connection failure, probably because another resource made pulumi RPC Server go down of the panic above.

abnud11 commented 1 month ago

I'm using awsx 2.16.0, just for the record.

flostadler commented 1 month ago

@abnud11 I was not able to reproduce this with the example you provided.

There were some other problems with the example though, some of the resources failed to create because of invalid inputs (added non-null assertions, ec2InstanceProfile.arn -> ec2InstanceProfile.instanceProfileName, base64 encode userdata). Please have a look at this version of your example with the problems fixed:

import * as pulumi from "@pulumi/pulumi";
import * as awsNative from "@pulumi/aws-native";
import * as awsx from "@pulumi/awsx";
import * as aws from "@pulumi/aws";

const bucket = new awsNative.s3.Bucket('mybucket');
const vpc = new awsx.ec2.Vpc('MyVPC', {
    natGateways: {
        strategy: awsx.ec2.NatGatewayStrategy.Single,
    },
    numberOfAvailabilityZones: 1,
});
const nestjsSecurityGroup = new awsNative.ec2.SecurityGroup(
    'MyNestjsSecurityGroup',
    {
        vpcId: vpc.vpcId,
        securityGroupEgress: [],
        securityGroupIngress: [
            {
                cidrIp: '0.0.0.0/0',
                ipProtocol: 'tcp',
                fromPort: 80,
                toPort: 80,
            },
        ],
        groupDescription: 'group for protecting nestjs backend',
    },
);
const nestjsLoadBalancerSecurityGroup = new awsNative.ec2.SecurityGroup(
    'MyNestLoadBalancerSecurityGroup',
    {
        vpcId: vpc.vpcId,
        securityGroupEgress: [
            {
                fromPort: 80,
                toPort: 80,
                ipProtocol: 'tcp',
                destinationSecurityGroupId: nestjsSecurityGroup.id,
            },
        ],
        securityGroupIngress: [
            {
                fromPort: 80,
                toPort: 80,
                ipProtocol: 'tcp',
                cidrIp: '0.0.0.0/0',
            },
        ],
        groupDescription: 'security group for nestjs load balancer',
    },
);
new awsNative.ec2.SecurityGroupIngress(
    'MyFromNestjsLoadBalancerToNestjs',
    {
        groupId: nestjsSecurityGroup.groupId,
        sourceSecurityGroupId: nestjsLoadBalancerSecurityGroup.id,
        fromPort: 80,
        toPort: 80,
        ipProtocol: 'tcp',
    },
);
const redisClusterSecurityGroup = new awsNative.ec2.SecurityGroup(
    'MyRedisClusterSecurityGroup',
    {
        vpcId: vpc.vpcId,
        securityGroupEgress: [],
        securityGroupIngress: [
            {
                fromPort: 6379,
                toPort: 6379,
                ipProtocol: 'tcp',
                sourceSecurityGroupId: nestjsSecurityGroup.id,
            },
        ],
        groupDescription: 'group for protecting redis cluster',
    },
);
const subnetGroup = new awsNative.elasticache.SubnetGroup(
    'MyRedisSubnetGroup',
    {
        subnetIds: vpc.privateSubnetIds,
        description: 'subnets to use to connect to redis',
        cacheSubnetGroupName: 'wowvir-redis-subnet-group',
    },
);
const redis = new aws.elasticache.Cluster('MyRedis', {
    engine: 'redis',
    engineVersion: '7.1',
    nodeType: 'cache.t4g.micro',
    numCacheNodes: 1,
    securityGroupIds: [redisClusterSecurityGroup.id],
    subnetGroupName: subnetGroup.cacheSubnetGroupName.apply(subnetGroupName => subnetGroupName!),
});
const nestjsCluster = new awsNative.ecs.Cluster('MyCluster', {
    clusterName: 'wowvir-nestjs-cluster',
});
const currentAccount = awsNative.getAccountIdOutput();
const nestjsServiceRole = new awsNative.iam.Role('MyServiceRole', {
    assumeRolePolicyDocument: {
        Version: '2012-10-17',
        Statement: [
            {
                Action: 'sts:AssumeRole',
                Principal: {
                    Service: 'ecs-tasks.amazonaws.com',
                },
                Effect: 'Allow',
                Condition: {
                    StringEquals: {
                        'aws:SourceAccount': currentAccount.accountId,
                    },
                },
            },
        ],
    },
});
const ecsInstanceRole = new awsNative.iam.Role('MyEC2InstanceRole', {
    assumeRolePolicyDocument: {
        Version: '2012-10-17',
        Statement: [
            {
                Action: 'sts:AssumeRole',
                Principal: {
                    Service: 'ec2.amazonaws.com',
                },
                Effect: 'Allow',
                Condition: {
                    StringEquals: {
                        'aws:SourceAccount': currentAccount.accountId,
                    },
                },
            },
        ],
    },
    managedPolicyArns: [
        aws.iam.ManagedPolicy.AmazonEC2ContainerServiceforEC2Role,
        aws.iam.ManagedPolicy.AmazonSSMManagedInstanceCore,
    ],
});

const ami = aws.ec2.getAmiOutput({
    filters: [
        {
            name: 'name',
            values: ['al2023-ami-ecs-hvm-2023.0.20240820-kernel-6.1-arm64'],
        },
    ],
});
const ec2InstanceProfile = new awsNative.iam.InstanceProfile(
    'MyEC2InstanceProfile',
    {
        roles: [ecsInstanceRole.roleName.apply((roleName) => roleName!)],
    },
);
new awsNative.ec2.Instance('MyInstance', {
    instanceType: 't4g.micro',
    imageId: ami.id,
    subnetId: vpc.privateSubnetIds.apply((subnets) => subnets[0]),
    iamInstanceProfile: ec2InstanceProfile.instanceProfileName.apply((instanceProfileName) => instanceProfileName!),
    userData: pulumi.interpolate`#!/bin/bash
      echo ECS_CLUSTER=${nestjsCluster.clusterName} >> /etc/ecs/ecs.config
    `.apply((ud) => Buffer.from(ud, "utf-8").toString("base64")),
    tags: [{ key: 'Name', value: 'backend server' }],
});

That being said though, the provider shouldn't panic under any circumstance. Are there any other circumstances or steps you took that can help me reproduce the panic?

flostadler commented 1 month ago

I spent some time analyzing the code this week and I think the panic happens if a resource has no diff, but write only properties: https://github.com/pulumi/pulumi-aws-native/blob/8fc72af8ad6a13b46bbdf4b4eeaf81343e2b1405/provider/pkg/resources/patching.go#L28.

This will lead to accessing a map that's nil. Cooking up a fix.

While I'm not able to trigger it, I think this can happen when we receive intermittent failures from Cloud Control:

The resource was created
Cloud Control reported it as failed but it actually succeeded (intermittent failure)
On the next run, Pulumi detects an update (to complete the operation) but the properties are still the same
We calculate the diff between A and A and get an empty diff (nil)
Panic in patch calculation

This PR should address it: https://github.com/pulumi/pulumi-aws-native/pull/1768

pulumi / pulumi-aws-native