Closed aureq closed 1 month ago
A possible workaround would be pulling the image before starting the agent and set pull_image: false
in the agent configuration file (https://www.pulumi.com/docs/pulumi-cloud/deployments/customer-managed-agents/#configuration-reference)
That will make the agent use the images available in the local docker instance.
@aureq I'm removing the p0 label until we have a clearer repro and confirmation of whether the workaround is sufficient.
@glena Any suggestion on how that would work withpull_image: false
and a private registry? Especially in a headless environment?
@komalali I'll work on that today.
you could run docker pull 1234.dkr.ecr.ap-southeast-2.amazonaws.com/pulumi/pulumi-python-executer:3.129.0
on bootstrap, no?
For example, if you are running the agent i some way similar to https://github.com/pulumi/pulumi-service/blob/master/cmd/customer-managed-deployment-agent/README.md#making-the-agent-start-on-boot
the ExecStart
could point to an .sh
that first pulls and then runs customer-managed-deployment-agent run
Relevant piece of the logs that I missed the first time around - this should not be panicking
2024/08/29 04:56:25 runner 18711: "panic: runtime error: invalid memory address or nil pointer dereference"
2024/08/29 04:56:25 runner 18711: "[signal SIGSEGV: segmentation violation code=0x1 addr=0x71 pc=0x7c5f45]"
2024/08/29 04:56:25 runner 18711: ""
2024/08/29 04:56:25 runner 18711: "goroutine 13 [running]:"
2024/08/29 04:56:25 runner 18711: "github.com/moby/moby/client.(*Client).getAPIPath(0x4586b8?, {0x1f99660?, 0x2ee15e0?}, {0xc0002bbc98, 0x11}, 0xc0002bbe88)"
2024/08/29 04:56:25 runner 18711: "\t/home/runner/go/pkg/mod/github.com/moby/moby@v24.0.9+incompatible/client/client.go:223 +0x45"
2024/08/29 04:56:25 runner 18711: "github.com/moby/moby/client.(*Client).sendRequest(0x0, {0x1f99660, 0x2ee15e0}, {0x1cc0ac8, 0x3}, {0xc0002bbc98?, 0x75aeea96?}, 0x0?, {0x0, 0x0}, ...)"
2024/08/29 04:56:25 runner 18711: "\t/home/runner/go/pkg/mod/github.com/moby/moby@v24.0.9+incompatible/client/request.go:114 +0x72"
2024/08/29 04:56:25 runner 18711: "github.com/moby/moby/client.(*Client).get(...)"
2024/08/29 04:56:25 runner 18711: "\t/home/runner/go/pkg/mod/github.com/moby/moby@v24.0.9+incompatible/client/request.go:36"
2024/08/29 04:56:25 runner 18711: "github.com/moby/moby/client.(*Client).ContainerLogs(0x0, {0x1f99660, 0x2ee15e0}, {0x0, 0x0}, {0x1, 0x1, {0x0, 0x0}, {0x0, ...}, ...})"
2024/08/29 04:56:25 runner 18711: "\t/home/runner/go/pkg/mod/github.com/moby/moby@v24.0.9+incompatible/client/container_logs.go:75 +0x6ae"
2024/08/29 04:56:25 runner 18711: "main.waitForContainer.func1()"
2024/08/29 04:56:25 runner 18711: "\t/home/runner/work/customer-managed-deployment-agent/customer-managed-deployment-agent/pulumi-service/cmd/workflow-runner/bootstrap.go:373 +0x74"
2024/08/29 04:56:25 runner 18711: "created by main.waitForContainer in goroutine 1"
2024/08/29 04:56:25 runner 18711: "\t/home/runner/work/customer-managed-deployment-agent/customer-managed-deployment-agent/pulumi-service/cmd/workflow-runner/bootstrap.go:372 +0xa5"
Here is an easy to deploy reproduction. The code assumes that
$HOME/.ssh/id_ed25519
and $HOME/.ssh/id_ed25519.pub
are present. This is required to SSH into the EC2 instance.The code will create a new stack (in your current project) with the suffix -deploy
that will use the agent pool.
Here are the steps
sshCommand
✅dockerCommand
(x5 commands) ✅agentCommand
✅-deploy
and check the deployment settings (agent pool and executor imageSetup
) with the error image pull failed; retrying
❌import * as fs from "fs";
import * as path from "path";
import * as pulumi from "@pulumi/pulumi";
import * as pulumiservice from "@pulumi/pulumiservice";
import * as aws from "@pulumi/aws";
import * as random from "@pulumi/random";
export = async () => {
const pulumiVersion = "3.130.0";
const serviceName = pulumi.getStack();
const projectName = pulumi.getProject();
const awsConfig = new pulumi.Config("aws");
const region = awsConfig.require("region");
const randomUuid = new random.RandomUuid(`${serviceName}-random-uuid`, {});
const publicKey = fs.readFileSync(path.join(process.env.HOME || "", ".ssh", "id_ed25519.pub"), "utf8");
const privateKey = fs.readFileSync(path.join(process.env.HOME || "", ".ssh", "id_ed25519"), "utf8");
const ecrRepository = new aws.ecr.Repository(`${serviceName}-repository`, {
name: `pulumi`,
imageTagMutability: "MUTABLE",
imageScanningConfiguration: {
scanOnPush: false,
}
});
const agentPool = new pulumiservice.AgentPool(`${serviceName}-agent-pool`, {
organizationName: pulumi.getOrganization(),
name: `${serviceName}-agent-pool`,
});
const deploymentStack = new pulumiservice.Stack(`${serviceName}-stack`, {
organizationName: pulumi.getOrganization(),
projectName: projectName,
stackName: `${serviceName}-deploy`,
});
const deploymentSettings = new pulumiservice.DeploymentSettings(`${serviceName}-deployment-settings`, {
organization: pulumi.getOrganization(),
project: projectName,
stack: `${serviceName}-deploy`,
agentPoolId: agentPool.agentPoolId,
sourceContext: {
git: {
repoUrl: "aureq/aws-s3-bucket",
branch: "main",
gitAuth: {
sshAuth: {
sshPrivateKey: privateKey,
}
}
}
},
executorContext: {
executorImage: pulumi.interpolate`${ecrRepository.repositoryUrl}:${pulumiVersion}`,
}
}, { parent: deploymentStack });
const ec2Role = new aws.iam.Role(`${serviceName}-iam-role`, {
assumeRolePolicy: aws.iam.assumeRolePolicyForPrincipal({ Service: "ec2.amazonaws.com" }),
});
const ec2RolePolicyAttachment = new aws.iam.RolePolicyAttachment(`${serviceName}-policy-attachment`, {
role: ec2Role,
policyArn: "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryFullAccess",
}, { parent: ec2Role });
const instanceProfile = new aws.iam.InstanceProfile(`${serviceName}-instance-profile`, {
role: ec2Role.name,
}, { parent: ec2Role });
const vpc = new aws.ec2.Vpc(`${serviceName}-vpc`, {
cidrBlock: "10.42.0.0/16",
enableDnsHostnames: true,
enableDnsSupport: true,
});
const igw = new aws.ec2.InternetGateway(`${serviceName}-igw`, {
vpcId: vpc.id,
},{ parent: vpc });
const rt = new aws.ec2.RouteTable(`${serviceName}-rt`, {
vpcId: vpc.id,
routes: [
{
cidrBlock: '0.0.0.0/0',
gatewayId: igw.id
}
]
}, { parent: vpc });
const subnet = new aws.ec2.Subnet(`${serviceName}-public`, {
vpcId: vpc.id,
cidrBlock: "10.42.0.0/20",
assignIpv6AddressOnCreation: false,
mapPublicIpOnLaunch: true,
availabilityZone: "ap-southeast-2a",
}, { parent: vpc });
const rta = new aws.ec2.RouteTableAssociation(`${serviceName}-pub-rta`, {
routeTableId: rt.id,
subnetId: subnet.id,
}, { parent: rt });
const sg = new aws.ec2.SecurityGroup(`${serviceName}-sg`, {
vpcId: vpc.id,
description: 'Allow TLS inbound traffic',
ingress: [{
cidrBlocks: ['0.0.0.0/0'],
fromPort: 22,
toPort: 22,
protocol: 'tcp'
}],
egress: [{
cidrBlocks: ['0.0.0.0/0'],
fromPort: 0,
toPort: 0,
protocol: '-1'
}],
}, { parent: vpc, deleteBeforeReplace: true });
const keyPair = new aws.ec2.KeyPair(`${serviceName}-ssh-key`, {
publicKey: publicKey,
});
const ami = aws.ec2.getAmi({
mostRecent: true,
owners: ["amazon"],
filters: [
{ name: "name", values: ["al2023-ami-2023.*.*-kernel-6.1-x86_64"] },
],
});
const userData = pulumi.interpolate`#!/bin/bash
DEPLOYMENT_TOKEN="${agentPool.tokenValue}"
echo "Installing Docker..."
sudo yum update -y
sudo yum install docker -y
sudo systemctl start docker
sudo groupadd docker
sudo usermod -aG docker ec2-user
sudo yum install amazon-ecr-credential-helper -y
echo "Configuring Docker to use ECR login..."
sudo mkdir /home/ec2-user/.docker
echo -e '{"credsStore": "ecr-login"}' > /home/ec2-user/.docker/config.json
echo "Installing agent..."
curl -fsSL https://raw.githubusercontent.com/pulumi/customer-managed-deployment-agent/main/install.sh | HOME="/home/ec2-user" bash
# Note: Home is explicitly set to /home/ec2-user as cloud-init runs as root and home is not set
echo "Configuring agent..."
cd /home/ec2-user/
cd .pulumi/bin/customer-managed-deployment-agent
echo -e "token: \\"$DEPLOYMENT_TOKEN\\"" > pulumi-deployment-agent.yaml
chown ec2-user:ec2-user -R /home/ec2-user/.`;
const instance = new aws.ec2.Instance(`${serviceName}-customer-agent`, {
instanceType: "m3.medium",
subnetId: subnet.id,
ami: ami.then(ami => ami.id),
keyName: keyPair.keyName,
userData: userData,
securityGroups: [ sg.id ],
iamInstanceProfile: instanceProfile.name,
tags: {
Name: `${serviceName}-customer-agent`,
},
}, { parent: subnet, ignoreChanges: ["securityGroups"], replaceOnChanges: ["userData"] });
return {
hostname: instance.publicDns,
executorImage: deploymentSettings.executorContext.apply(v => v?.executorImage),
sshCommand: `ssh ec2-user@(pulumi stack output hostname)`,
agentCommand: `ssh ec2-user@(pulumi stack output hostname) /home/ec2-user/.pulumi/bin/customer-managed-deployment-agent/customer-managed-deployment-agent run`,
dockerCommand: pulumi.interpolate`docker pull pulumi/pulumi:${pulumiVersion}\ndocker tag pulumi/pulumi:${pulumiVersion} ${ecrRepository.repositoryUrl}:${pulumiVersion}\ndocker push ${ecrRepository.repositoryUrl}:${pulumiVersion}\ndocker rmi pulumi/pulumi:${pulumiVersion}\ndocker rmi ${ecrRepository.repositoryUrl}:${pulumiVersion}`
};
}
2024/08/30 10:36:22 Running deployment '79a6c0ff-ad22-45ac-85d8-3a1fc73c09b1'
2024/08/30 10:36:22 Using temp directory: /tmp/pulumi-workflow-job-b0674797-7b9a-48db-b141-8df412ded0f4+0-168686564
2024/08/30 10:36:22 starting runner with working directory /tmp/pulumi-workflow-job-b0674797-7b9a-48db-b141-8df412ded0f4+0-168686564
2024/08/30 10:36:23 runner 27252: "Preparing environment"
2024/08/30 10:36:23 runner 27252: "Pulling container image \"052848974346.dkr.ecr.ap-southeast-2.amazonaws.com/pulumi:3.130.0\""
2024/08/30 10:36:23 runner 27252: "image pull failed; retrying"
2024/08/30 10:36:23 runner 27252: "Pulling container image \"052848974346.dkr.ecr.ap-southeast-2.amazonaws.com/pulumi:3.130.0\""
2024/08/30 10:36:23 runner 27252: "image pull failed; retrying"
2024/08/30 10:36:23 runner 27252: "Pulling container image \"052848974346.dkr.ecr.ap-southeast-2.amazonaws.com/pulumi:3.130.0\""
2024/08/30 10:36:23 runner 27252: "image pull failed; retrying"
2024/08/30 10:36:23 runner 27252: "Pulling container image \"052848974346.dkr.ecr.ap-southeast-2.amazonaws.com/pulumi:3.130.0\""
2024/08/30 10:36:23 runner 27252: "image pull failed; retrying"
2024/08/30 10:36:24 runner 27252: "Pulling container image \"052848974346.dkr.ecr.ap-southeast-2.amazonaws.com/pulumi:3.130.0\""
2024/08/30 10:36:24 runner 27252: "giving up"
2024/08/30 10:36:24 runner 27252: "panic: runtime error: invalid memory address or nil pointer dereference"
2024/08/30 10:36:24 runner 27252: "[signal SIGSEGV: segmentation violation code=0x1 addr=0x71 pc=0x7c5f45]"
2024/08/30 10:36:24 runner 27252: ""
2024/08/30 10:36:24 runner 27252: "goroutine 25 [running]:"
2024/08/30 10:36:24 runner 27252: "github.com/moby/moby/client.(*Client).getAPIPath(0x4586b8?, {0x1f99660?, 0x2ee15e0?}, {0xc0008e5c98, 0x11}, 0xc0008e5e88)"
2024/08/30 10:36:24 runner 27252: "\t/home/runner/go/pkg/mod/github.com/moby/moby@v24.0.9+incompatible/client/client.go:223 +0x45"
2024/08/30 10:36:24 runner 27252: "github.com/moby/moby/client.(*Client).sendRequest(0x0, {0x1f99660, 0x2ee15e0}, {0x1cc0ac8, 0x3}, {0xc0008e5c98?, 0x0?}, 0x2?, {0x0, 0x0}, ...)"
2024/08/30 10:36:24 runner 27252: "\t/home/runner/go/pkg/mod/github.com/moby/moby@v24.0.9+incompatible/client/request.go:114 +0x72"
2024/08/30 10:36:24 runner 27252: "github.com/moby/moby/client.(*Client).get(...)"
2024/08/30 10:36:24 runner 27252: "\t/home/runner/go/pkg/mod/github.com/moby/moby@v24.0.9+incompatible/client/request.go:36"
2024/08/30 10:36:24 runner 27252: "github.com/moby/moby/client.(*Client).ContainerLogs(0x0, {0x1f99660, 0x2ee15e0}, {0x0, 0x0}, {0x1, 0x1, {0x0, 0x0}, {0x0, ...}, ...})"
2024/08/30 10:36:24 runner 27252: "\t/home/runner/go/pkg/mod/github.com/moby/moby@v24.0.9+incompatible/client/container_logs.go:75 +0x6ae"
2024/08/30 10:36:24 runner 27252: "main.waitForContainer.func1()"
2024/08/30 10:36:24 runner 27252: "\t/home/runner/work/customer-managed-deployment-agent/customer-managed-deployment-agent/pulumi-service/cmd/workflow-runner/bootstrap.go:373 +0x74"
2024/08/30 10:36:24 runner 27252: "created by main.waitForContainer in goroutine 1"
2024/08/30 10:36:24 runner 27252: "\t/home/runner/work/customer-managed-deployment-agent/customer-managed-deployment-agent/pulumi-service/cmd/workflow-runner/bootstrap.go:372 +0xa5"
2024/08/30 10:36:24 job completed
2024/08/30 10:36:24 runner 27252: waitid: no child processes
2024/08/30 10:36:52 Deployment cancelled, stopping job
2024/08/30 10:36:52 could not cancel job: no such process
2024/08/30 10:36:52 Cleaning up
What happened?
A customer is running this customer managed agent on EC2 and with a Dockerd as the target. The EC2 instance has an instance role to allow access to their AWS Container registry so private images can be pulled.
However, when a new Deployment is about to be run, the agent crashes. This seems to be caused because the agent is unable to pull the image. Since this is a private registry, a custom
~/.docker/config.json
(see below) is required (ecr-login
is present on the EC2 instance).It's confirmed that using the
docker
CLI, the image can be pulled correctly.Questions:
ecr-login
? (see doc)crash logs
deployment agent config
/home/ec2-user/.pulumi/bin/customer-managed-deployment-agent/pulumi-deployment-agent.yaml
/home/ec2-user/.docker/config.json
docker pull
CLI outputExample
n/a
Output of
pulumi about
n/a
Additional context
cloud-init script
Contributing
Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).