pulumi / pulumi-gcp

A Google Cloud Platform (GCP) Pulumi resource package, providing multi-language access to GCP
Apache License 2.0
179 stars 52 forks source link

Deploying new revisions of Cloud Run resources yields 409: Conflict #350

Open floyd-may opened 4 years ago

floyd-may commented 4 years ago

Deploying new revisions of Cloud Run resources is failing due to the GCP API giving an error response of 409: Conflict. This occurs whether or not the AutoGenerateRevisionName field is set (as mentioned by this Terraform provider issue: https://github.com/terraform-providers/terraform-provider-google/issues/5898). I've tried AutoGenerateRevisionName and manually generating a random name as well.

floyd-may commented 4 years ago

@stack72 Get with me on the Slack org to set up a video call if you'd like me to walk you through what I'm experiencing.

jaxxstorm commented 4 years ago

Can you post the code you're using to deploy so we can try repro this?

floyd-may commented 4 years ago

You bet -

var svc = new cloudrun.Service("my-service-name", new cloudrun.ServiceArgs
{
    Name = "my-service-name",
    Traffics = new InputList<cloudrun.Inputs.ServiceTrafficArgs>
    {
        new cloudrun.Inputs.ServiceTrafficArgs
        {
            LatestRevision = true,
            Percent = 100
        }
    },
    Template = new cloudrun.Inputs.ServiceTemplateArgs
    {
        Spec = new cloudrun.Inputs.ServiceTemplateSpecArgs
        {
            Containers = new InputList<cloudrun.Inputs.ServiceTemplateSpecContainerArgs> {
                new cloudrun.Inputs.ServiceTemplateSpecContainerArgs {
                    Image = DockerImage, // a gcr.io docker image URL
                    Envs = new InputList<cloudrun.Inputs.ServiceTemplateSpecContainerEnvArgs> {
                        new cloudrun.Inputs.ServiceTemplateSpecContainerEnvArgs
                        {
                            Name = "Authentication__Google__ClientId",
                            Value = GoogleOauthClientId
                        },
                        new cloudrun.Inputs.ServiceTemplateSpecContainerEnvArgs
                        {
                            Name = "Authentication__AzureAD__ClientId",
                            Value = AzureOauthClientId
                        },
                        new cloudrun.Inputs.ServiceTemplateSpecContainerEnvArgs
                        {
                            Name = "Authentication__AzureAD__TenantId",
                            Value = AzureOauthTenantId
                        },
                    }
                }
            },
            ContainerConcurrency = 1,
        },
    },
    Location = "us-central1",
    AutogenerateRevisionName = true
});
Sytten commented 4 years ago

I am also having this issue, it happens when you modify something only and then try to modify it in pulumi. The strange thing is that now my stack is corrupted because pulumi thinks it successfully apply the change. It seems like the update is not polling first the latest revision before trying to upgrade it. I think its probably an issue on the provider itself.

Sytten commented 4 years ago

Also I tried deleting the service in GCP and pulumi simply did not detect that the service was gone... I would have expected it to try to recreate it.

Sytten commented 4 years ago

Edit: I tried doing pure terraform and I don't see this problem, so this is a problem of Pulumi not using the provider correctly and not fetching the latest resource.

floyd-may commented 4 years ago

I wonder if it has to do with Terraform's concept of "virtual fields". I searched for "virtual" in the codebase (both the GCP provider and core Pulumi) and didn't find much.

lukehoban commented 4 years ago

Edit: I tried doing pure terraform and I don't see this problem, so this is a problem of Pulumi not using the provider correctly and not fetching the latest resource.

One key thing that is different by default between Pulumi and Terraform is that Terraform does a "refresh" by default (but can opt-out), and Pulumi does not (but can opt-in). It may be that the Cloud Run resource was designed to require having a refresh done prior to being updated?

Can you try running pulumi refresh and the retrying the pulumi up and see if that works?

In general it is a "bug" in a provider if it cannot be used correctly without a refresh - as users can and often do opt-out of refresh by default in Terraform as well - but there may be cases where it is unavoidable.

Note that we are considering changing this default in https://github.com/pulumi/pulumi/issues/2247.

floyd-may commented 4 years ago

Doing a refresh worked once, but when I added pulumi refresh --yes to my CD script prior to pulumi up --yes it failed again with a 409, so I think there's something else at play here as well. Are there any considerations I should weigh (other than it takes longer to refresh then deploy) when adding pulumi refresh --yes to my CD scripts?

kaisellgren commented 4 years ago

I have this exact same issue (409).

return new gcp.cloudrun.Service(
    `${prefix}-app`,
    {
      name: `${prefix}-app`,
      location,
      template: {
        spec: {
          containers: [
            {
              image: imageUrl,
              envs: [
                {
                  name: 'PUBLIC_BUCKET_URL',
                  value: publicBucketName,
                },
              ],
              resources: {
                requests: {
                  memory: '64Mi',
                  cpu: '200m',
                },
                limits: {
                  memory: '256Mi',
                  cpu: '1000m',
                },
              },
            },
          ],
          containerConcurrency: 80,
        },
      },
    },
    { dependsOn: enableCloudRun },
  )

When I try to add a new env variable and run pulumi up it fails to update with:

Diagnostics:
  gcp:cloudrun:Service (x-dev-app):
    error: 1 error occurred:
        * updating urn:pulumi:dev::x::gcp:cloudrun/service:Service::x-dev-app: Error updating Service "locations/europe-north1/namespaces/x/services/x-dev-app": googleapi: Error 409: Revision named 'x-dev-app-00022-jot' with different configuration already exists.

If I add this:

      autogenerateRevisionName: true,

and run pulumi up:

Diagnostics:
  gcp:cloudrun:Service (x-dev-app):
    error: 1 error occurred:
        * updating urn:pulumi:dev::x::gcp:cloudrun/service:Service::x-dev-app: Error updating Service "locations/europe-north1/namespaces/x/services/x-dev-app": googleapi: Error 409: Conflict for resource 'x-dev-app' for version 'xxx'.

Running pulumi refresh will take a moment to update the state, but ultimately it has no effect on this issue and the issue continues to persist.

floyd-may commented 4 years ago

Any update on this @jaxxstorm?

leezen commented 4 years ago

We're taking a look. It'd be helpful if anyone running into this is able to post detailed logs (https://www.pulumi.com/docs/troubleshooting/#verbose-logging) from running pulumi refresh when updating state and what pulumi refresh shows in its diff.

floyd-may commented 4 years ago

Hi @leezen. I'm glad to provide logs. With the verbosity turned all the way up, will the logs contain secrets or other sensitive information that shouldn't be public?

Sytten commented 4 years ago

I will do that this weekend. yes it will contain sensitive stuff @floyd-may. Can you provide an email so we could send it to you securely @leezen thanks!

leezen commented 4 years ago

Yes, if you don't want to clean it, can you please DM to me on slack.pulumi.com? Alternative, lee@ via email works, too.

jaxxstorm commented 4 years ago

We believe this is now resolved with some upstream changes to the terraform provider. @stack72 and I could not reproduce this. If anyone else has this problem, please make sure you're using the latest version of the provider and if it persists, feel free to reopen this issue.

Sytten commented 4 years ago

I will test and confirm, can you link the issue/pr of the upstream provider for posterity @jaxxstorm? thanks!

jonsherrard commented 4 years ago

I've had this issue in the past, and everything's been fine for a while.

It's just started happening again.

The only thing I can think of is that during the deployment process a Cloud Function failed to deploy, (issue with the resource reference), which failed the process. I am then getting to a partially deployed state that causes the errors in Cloud Run world?

googleapi: Error 409: Conflict for resource 'redacted-41a830d': version '1595922945261662' was specified but current version is '1595922945369000'.
kaisellgren commented 4 years ago

Does a pulumi refresh help or is it stuck in this conflict state?

idoshamun commented 3 years ago

refresh worked for me!

Sytten commented 3 years ago

Refresh usually works yes

OliverHGray commented 3 years ago

How can this issue be fixed when a refresh doesn't help? The problem I'm having is the service has been manually deployed and it doesn't seem to be able to come back under Pulumi control.

I've tried refreshing, setting the revision name explicitly and also exporting the stack configuration and removing occurences the offending revision name from the inputs section. None of which is helping matters.

I'm also interested to know why Pulumi (although I guess it's probably Terraform) even tries to upgrade using the current revision name? Won't that always fail?

floyd-may commented 3 years ago

I've been writing Cloud Run resources quite a bit lately and haven't seen any issues. My guess is that your mix of manual and managed deployments is possibly the culprit here.

jeduden commented 3 years ago

I still have the problem. It doesn't happen always, deploying with '--refresh' helps, however it does make the deployment a lot slower. Is there a better solution ? Can we force refresh may be just the cloudrun resource ?

xskif commented 3 years ago

Same here. The issue should be reopened.

leezen commented 3 years ago

Can we force refresh may be just the cloudrun resource ?

Yes, it's possible to do a targeted refresh with the -t option.

yonathan06 commented 2 years ago

Using import with the resource id once did the trick for me: https://www.pulumi.com/docs/guides/adopting/import/

pierskarsenbarg commented 4 months ago

Got someone hitting this again by deploying a container that wouldn't start. Subsequent updates failed until a refresh was run.

For Pulumi engineers: see https://pulumi.slack.com/archives/CBVJAP46L/p1715511335035519 for logs

alexhwoods commented 4 months ago

Same thing as @pierskarsenbarg. If container won't start, you can't update it anymore.

Here's my error

Error 409: Conflict for resource 'makeswift': version '1715743123779128' was specified but current version is '1715744121398301'.
alexhwoods commented 4 months ago

A pulumi refresh does fix it. The Pulumi provider seems to have an expectation that the deployment succeeded

daaain commented 2 months ago

A simple pulumi refresh didn't work for me, but pulumi refresh -t urn:pulumi:xxx::yyy::gcp:cloudrun/service:Service::zzz did