runatlantis / atlantis

Terraform Pull Request Automation
https://www.runatlantis.io
Other
7.76k stars 1.05k forks source link

"Invalid Key" when attempting to stream logs over websocket #2129

Open alex-mozejko opened 2 years ago

alex-mozejko commented 2 years ago

Community Note


Overview of the Issue

Cannot view streaming logs. The github actions check appears but the "details" button redirects to a page with the atlantis logo an a blank black text box.

The browser console shows:

WebSocket connection to 'wss://example.org/jobs/4bf24863-4b49-490f-afe1-d25d44a6530b/ws' failed: 

but other, non-websocket requests (icons, css, js, etc) are successful.

Reproduction Steps

  1. deploy atlantis using helm chart version 3.16.0 (atlantis version 0.19.1)
  2. run an atlantis plan on PR in github
  3. click the "details" button in the "checks" section
  4. connection to websocket fail

Logs

  2022-03-09 09:36:07   {"level":"debug","ts":"2022-03-09T01:36:07.393Z","caller":"server/middleware.go:44","msg":"GET /jobs/4bf24863-4b49-490f-afe1-d25d44a6530b – from 127.0.0.1:57890","json":{}}
  2022-03-09 09:36:07   {"level":"debug","ts":"2022-03-09T01:36:07.393Z","caller":"server/middleware.go:70","msg":"GET /jobs/4bf24863-4b49-490f-afe1-d25d44a6530b – respond HTTP 200","json":{}}
  2022-03-09 09:36:09   {"level":"debug","ts":"2022-03-09T01:36:09.703Z","caller":"server/middleware.go:44","msg":"GET /static/images/atlantis-icon_512.png – from 127.0.0.1:57890","json":{}}
  2022-03-09 09:36:09   {"level":"debug","ts":"2022-03-09T01:36:09.705Z","caller":"server/middleware.go:70","msg":"GET /static/images/atlantis-icon_512.png – respond HTTP 200","json":{}}
  2022-03-09 09:36:11   {"level":"debug","ts":"2022-03-09T01:36:11.241Z","caller":"server/middleware.go:44","msg":"GET /jobs/4bf24863-4b49-490f-afe1-d25d44a6530b/ws – from 127.0.0.1:57890","json":{}}
  2022-03-09 09:36:11   {"level":"error","ts":"2022-03-09T01:36:11.241Z","caller":"logging/simple_logger.go:161","msg":"invalid key: 4bf24863-4b49-490f-afe1-d25d44a6530b","json":{},"stacktrace":"github.com/runatlantis/atlantis/server/logging.(*StructuredLogger).Log
  github.com/runatlantis/atlantis/server/logging/simple_logger.go:161
  github.com/runatlantis/atlantis/server/controllers.(*JobsController).respond
    github.com/runatlantis/atlantis/server/controllers/jobs_controller.go:67
  github.com/runatlantis/atlantis/server/controllers.(*JobsController).GetProjectJobsWS
    github.com/runatlantis/atlantis/server/controllers/jobs_controller.go:60
  net/http.HandlerFunc.ServeHTTP
  net/http/server.go:2047
  github.com/gorilla/mux.(*Router).ServeHTTP
    github.com/gorilla/mux@v1.8.0/mux.go:210
    github.com/urfave/negroni.Wrap.func1
    github.com/urfave/negroni@v1.0.0/negroni.go:46
  github.com/urfave/negroni.HandlerFunc.ServeHTTP
    github.com/urfave/negroni@v1.0.0/negroni.go:29
  github.com/urfave/negroni.middleware.ServeHTTP
    github.com/urfave/negroni@v1.0.0/negroni.go:38
  github.com/runatlantis/atlantis/server.(*RequestLogger).ServeHTTP
    github.com/runatlantis/atlantis/server/middleware.go:68
  github.com/urfave/negroni.middleware.ServeHTTP
  github.com/urfave/negroni@v1.0.0/negroni.go:38
  github.com/urfave/negroni.(*Recovery).ServeHTTP
  github.com/urfave/negroni@v1.0.0/recovery.go:193
  github.com/urfave/negroni.middleware.ServeHTTP
  github.com/urfave/negroni@v1.0.0/negroni.go:38
  github.com/urfave/negroni.(*Negroni).ServeHTTP
  github.com/urfave/negroni@v1.0.0/negroni.go:96
  net/http.serverHandler.ServeHTTP
  net/http/server.go:2879
  net/http.(*conn).serve
  net/http/server.go:1930"}
  2022-03-09 09:36:11   {"level":"debug","ts":"2022-03-09T01:36:11.241Z","caller":"server/middleware.go:70","msg":"GET /jobs/4bf24863-4b49-490f-afe1-d25d44a6530b/ws – respond HTTP 400","json":{}}
  2022-03-09 09:36:43   {"level":"debug","ts":"2022-03-09T01:36:43.178Z","caller":"server/middleware.go:44","msg":"GET /healthz – from 127.0.0.1:56884","json":{}}
  2022-03-09 09:36:43   {"level":"debug","ts":"2022-03-09T01:36:43.178Z","caller":"server/middleware.go:70","msg":"GET /healthz – respond HTTP 200","json":{}}
  2022-03-09 09:35:43   {"level":"debug","ts":"2022-03-09T01:35:43.181Z","caller":"server/middleware.go:70","msg":"GET /healthz – respond HTTP 200","json":{}}

Environment details

test: enabled: false logLevel: "debug" atlantisUrl: https://example.org defaultTFVersion: 0.15.4 config: | hide-prev-plan-comments: true ingress: enabled: true path: / host: gh.example.org annotations: kubernetes.io/ingress.class: nginx kubernetes.io/tls-acme: "true"

https://api.github.com/meta

nginx.ingress.kubernetes.io/whitelist-source-range: 192.30.252.0/22,185.199.108.0/22,140.82.112.0/20

labels: {} tls:

Additional Context

I'm not sure if this is an Atlantis issue or maybe an nginx / ingress issue, i would be very thankful for any advice :)

claas-fridtjof-lisowski commented 2 years ago

@alex-mozejko we face the same issue. You should be able to rule out ingress/nginx issue by doing a port-forward to the service or pod and then try to call the job endpoint again.

It doesn't change anything on our side though.

I think it's coming from here(that's what the logs indicate), where it checks if the job exists with the given ID: https://github.com/runatlantis/atlantis/blob/master/server/controllers/websocket/mux.go#L58 https://github.com/runatlantis/atlantis/blob/master/server/controllers/jobs_controller.go#L67

For our setup I found the reason. The documentation says that the log stream doesn't work with custom apply / plan commands. We use a Makefile as a wrapper. I got the log stream working now by using the default init/plan/apply from Atlantis.

jonwtech commented 2 years ago

Ah same here, we're using Terragrunt and encountering the same issue. Just tested on a plain Terraform project using the default workflow and it works perfectly. Custom workflow support would be amazing!

david-heward-unmind commented 2 years ago

πŸ‘ for custom workflow support

macropin commented 2 years ago

I think there are two separate and unrelated issues being discussed in this issue:

  1. Ingress websocket issues (possibly related to ingress controller config or LB used, eg Classic ELB does not support websockets you need to use NLB or ALB)
  2. and issues with realtime streaming of logs from custom workflows.

FWIW, Atlantis really needs a configuration option to disable websockets.

jamengual commented 2 years ago

is this still happening with v0.19.8?

hungran commented 2 years ago

Still happend v0.19.8 We used nginx ingress with Azure LB

hungran commented 2 years ago

it has been failed to get wss output the wss worked for me until I made a change when put into environment variable as below,

    {
      name = "ATLANTIS_TFE_TOKEN"
      value = "xxxx"
    }

Environment details:

source = "terraform-aws-modules/atlantis/aws" version = "~> 3.0"

module "atlantis" {
  source  = "terraform-aws-modules/atlantis/aws"
  version = "~> 3.0"

  name = "atlantis"

  # VPC
  cidr            = "10.20.0.0/16"
  azs             = ["ap-southeast-1a", "ap-southeast-1b", "ap-southeast-1c"]
  private_subnets = ["10.20.1.0/24", "10.20.2.0/24", "10.20.3.0/24"]
  public_subnets  = ["10.20.101.0/24", "10.20.102.0/24", "10.20.103.0/24"]

  # DNS (without trailing dot)
  route53_zone_name = "xxxx"

  # ACM (SSL certificate) - Specify ARN of an existing certificate or new one will be created and validated using Route53 DNS
  certificate_arn = "arn:aws:acm:xxxxxx"
  custom_environment_variables = [
    {
      name : "ATLANTIS_REPO_CONFIG_JSON",
      value : jsonencode(yamldecode(file("${path.module}/server.yaml"))),
    },
    {
      name = "ATLANTIS_TFE_TOKEN"
      value = "xxxx"
    }
  ]
  # Atlantis
  atlantis_github_user = "xxx"
  atlantis_github_user_token = "xxxx"
  atlantis_repo_allowlist = ["xxx"]
  policies_arn =  ["xxxx"]

  user = "100:1000"
}
anton-livewyer commented 2 years ago

@hungran i observe the same behavior - logs reflected fine without ATLANTIS_TFE_TOKEN and i have blank screen with it

prastamaha commented 2 years ago

as a workaround to solve this, instead of using the ATLANTIS_TFE_TOKEN variable, you can inject it as a volume.

Create a secret from terraformrc file

$ cat ~/.terraformrc

credentials "app.terraform.io" {
  token = "xxx"
}
$ kubectl create secret generic atlantis-terraformrc --from-file=~/.terraformrc

Then inject the secret as a volume in the helm chart

extraVolumes:
  - name: terraformrc
    secret:
      secretName: atlantis-terraformrc
      defaultMode: 0400

extraVolumeMounts:
   - name: terraformrc
     mountPath: /home/atlantis/
     readOnly: true
amontalban commented 1 year ago

We are having the same issue with 0.20.1 running in AWS ECS. We do not have ATLANTIS_TFE_TOKEN set.

This is what we see in the logs:

{"level":"error","ts":"2022-11-10T11:40:31.070Z","caller":"logging/simple_logger.go:163","msg":"invalid key: 67b1873e-03cd-4574-a903-66a61805bdec","json":{},"stacktrace":"github.com/runatlantis/atlantis/server/logging.(*StructuredLogger).Log\n\tgithub.com/runatlantis/atlantis/server/logging/simple_logger.go:163\ngithub.com/runatlantis/atlantis/server/controllers.(*JobsController).respond\n\tgithub.com/runatlantis/atlantis/server/controllers/jobs_controller.go:92\ngithub.com/runatlantis/atlantis/server/controllers.(*JobsController).getProjectJobsWS\n\tgithub.com/runatlantis/atlantis/server/controllers/jobs_controller.go:70\ngithub.com/runatlantis/atlantis/server/controllers.(*JobsController).GetProjectJobsWS\n\tgithub.com/runatlantis/atlantis/server/controllers/jobs_controller.go:83\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2109\ngithub.com/gorilla/mux.(*Router).ServeHTTP\n\tgithub.com/gorilla/mux@v1.8.0/mux.go:210\ngithub.com/urfave/negroni.Wrap.func1\n\tgithub.com/urfave/negroni@v1.0.0/negroni.go:46\ngithub.com/urfave/negroni.HandlerFunc.ServeHTTP\n\tgithub.com/urfave/negroni@v1.0.0/negroni.go:29\ngithub.com/urfave/negroni.middleware.ServeHTTP\n\tgithub.com/urfave/negroni@v1.0.0/negroni.go:38\ngithub.com/runatlantis/atlantis/server.(*RequestLogger).ServeHTTP\n\tgithub.com/runatlantis/atlantis/server/middleware.go:70\ngithub.com/urfave/negroni.middleware.ServeHTTP\n\tgithub.com/urfave/negroni@v1.0.0/negroni.go:38\ngithub.com/urfave/negroni.(*Recovery).ServeHTTP\n\tgithub.com/urfave/negroni@v1.0.0/recovery.go:193\ngithub.com/urfave/negroni.middleware.ServeHTTP\n\tgithub.com/urfave/negroni@v1.0.0/negroni.go:38\ngithub.com/urfave/negroni.(*Negroni).ServeHTTP\n\tgithub.com/urfave/negroni@v1.0.0/negroni.go:96\nnet/http.serverHandler.ServeHTTP\n\tnet/http/server.go:2947\nnet/http.(*conn).serve\n\tnet/http/server.go:1991"}
{
  "level": "error",
  "ts": "2022-11-10T11:40:31.070Z",
  "caller": "logging/simple_logger.go:163",
  "msg": "invalid key: 67b1873e-03cd-4574-a903-66a61805bdec",
  "json": {},
  "stacktrace": "...see below..."
}
$ echo -E $stacktrace | jq -r .stacktrace
github.com/runatlantis/atlantis/server/logging.(*StructuredLogger).Log
    github.com/runatlantis/atlantis/server/logging/simple_logger.go:163
github.com/runatlantis/atlantis/server/controllers.(*JobsController).respond
    github.com/runatlantis/atlantis/server/controllers/jobs_controller.go:92
github.com/runatlantis/atlantis/server/controllers.(*JobsController).getProjectJobsWS
    github.com/runatlantis/atlantis/server/controllers/jobs_controller.go:70
github.com/runatlantis/atlantis/server/controllers.(*JobsController).GetProjectJobsWS
    github.com/runatlantis/atlantis/server/controllers/jobs_controller.go:83
net/http.HandlerFunc.ServeHTTP
    net/http/server.go:2109
github.com/gorilla/mux.(*Router).ServeHTTP
    github.com/gorilla/mux@v1.8.0/mux.go:210
github.com/urfave/negroni.Wrap.func1
    github.com/urfave/negroni@v1.0.0/negroni.go:46
github.com/urfave/negroni.HandlerFunc.ServeHTTP
    github.com/urfave/negroni@v1.0.0/negroni.go:29
github.com/urfave/negroni.middleware.ServeHTTP
    github.com/urfave/negroni@v1.0.0/negroni.go:38
github.com/runatlantis/atlantis/server.(*RequestLogger).ServeHTTP
    github.com/runatlantis/atlantis/server/middleware.go:70
github.com/urfave/negroni.middleware.ServeHTTP
    github.com/urfave/negroni@v1.0.0/negroni.go:38
github.com/urfave/negroni.(*Recovery).ServeHTTP
    github.com/urfave/negroni@v1.0.0/recovery.go:193
github.com/urfave/negroni.middleware.ServeHTTP
    github.com/urfave/negroni@v1.0.0/negroni.go:38
github.com/urfave/negroni.(*Negroni).ServeHTTP
    github.com/urfave/negroni@v1.0.0/negroni.go:96
net/http.serverHandler.ServeHTTP
    net/http/server.go:2947
net/http.(*conn).serve
    net/http/server.go:1991
nitrocode commented 1 year ago

Hmmm @amontalban seems like the issue thrown for you is in this block

https://github.com/runatlantis/atlantis/blob/4cacaeb0c7633c98a9a5f873faaf391b259bf355/server/controllers/jobs_controller.go#L66-L88

houdinisparks commented 1 year ago

we are using terragrunt for our workflows, and faced the exact same issue as well with the below msg and stacktrace

msg:
invalid key: 7d8ffd20-f9c4-4977-87c7-d11185c6d1b9

stacktrace: 
github.com/runatlantis/atlantis/server/logging.(*StructuredLogger).Log
github.com/runatlantis/atlantis/server/logging/simple_logger.go:163
github.com/runatlantis/atlantis/server/controllers.(*JobsController).respond
github.com/runatlantis/atlantis/server/controllers/jobs_controller.go:92
github.com/runatlantis/atlantis/server/controllers.(*JobsController).getProjectJobsWS
github.com/runatlantis/atlantis/server/controllers/jobs_controller.go:70
github.com/runatlantis/atlantis/server/controllers.(*JobsController).GetProjectJobsWS
github.com/runatlantis/atlantis/server/controllers/jobs_controller.go:83
net/http.HandlerFunc.ServeHTTP
net/http/server.go:2109
github.com/gorilla/mux.(*Router).ServeHTTP
github.com/gorilla/mux@v1.8.0/mux.go:210
github.com/urfave/negroni.Wrap.func1
github.com/urfave/negroni@v1.0.0/negroni.go:46
github.com/urfave/negroni.HandlerFunc.ServeHTTP
github.com/urfave/negroni@v1.0.0/negroni.go:29
github.com/urfave/negroni.middleware.ServeHTTP
github.com/urfave/negroni@v1.0.0/negroni.go:38
github.com/runatlantis/atlantis/server.(*RequestLogger).ServeHTTP
github.com/runatlantis/atlantis/server/middleware.go:70
github.com/urfave/negroni.middleware.ServeHTTP
github.com/urfave/negroni@v1.0.0/negroni.go:38
github.com/urfave/negroni.(*Recovery).ServeHTTP
github.com/urfave/negroni@v1.0.0/recovery.go:193
github.com/urfave/negroni.middleware.ServeHTTP
github.com/urfave/negroni@v1.0.0/negroni.go:38
github.com/urfave/negroni.(*Negroni).ServeHTTP
github.com/urfave/negroni@v1.0.0/negroni.go:96
net/http.serverHandler.ServeHTTP
net/http/server.go:2947
net/http.(*conn).serve
net/http/server.go:1991

turns out its because we had incorrect s3 permissions for terragrunt to read our remote state and the atlantis plan action was actually hanging at this prompt:

Error running plan operation: running "terragrunt plan -input=false -out=$PLANFILE" in "/home/atlantis/.atlantis/repos//infrastructure/169/default/states/acct.gcc-r-mgmt/ecr": signal: terminated: running "terragrunt plan -input=false -out=$PLANFILE" in "/home/atlantis/.atlantis/repos/infrastructure/169/default/states/acct.gcc-r-mgmt/ecr":
Remote state S3 bucket sst-s3-<redacted> does not exist or you don't have permissions to access it. Would you like Terragrunt to create it? (y/n)

this prompt wasn't shown on the UI's log stream likely because it hadn't been flushed yet as it waiting for user input and there weren't enough lines for the buffer to flush. we realised it only when we restarted our ECS services that forced the buffer to flush and then we saw the prompt on our ECS logs.

once our s3 permissions was fixed, the websocket responded fine and we could see our log streams:

image

@amontalban maybe this might help!

repos.yml for reference

repos:
  - id: "/.*/"
    workflow: terragrunt
    apply_requirements: [ approved, mergeable ]

workflows:
  terragrunt:
    plan:
      steps:
        - env:
            name: TERRAGRUNT_TFPATH
            command: 'echo "terraform${ATLANTIS_TERRAFORM_VERSION}"'
        - run: terragrunt plan -input=false -out=$PLANFILE
        - run: terragrunt show -json $PLANFILE > $SHOWFILE
    apply:
      steps:
        - env:
            name: TERRAGRUNT_TFPATH
            command: 'echo "terraform${ATLANTIS_TERRAFORM_VERSION}"'
        - run: terragrunt apply -input=false $PLANFILE
nitrocode commented 1 year ago

This may be related to https://github.com/runatlantis/atlantis/issues/2026

Please try @houdinisparks method and use the latest version if possible.

@amontalban @prastamaha @anton-livewyer @hungran @macropin @davidh-unmind @jonwtech @claas-fridtjof-lisowski

JFryy commented 1 year ago

This issue is still present onatlantis v0.24.0, removing the tfe token did indeed fix it.

hungran commented 11 months ago

we solved the issue by this ATLANTIS_TFE_LOCAL_EXECUTION_MODE=true

Almenon commented 9 months ago

Same issue with ws connection failure and "invalid key", but we have never specified ATLANTIS_TFE_TOKEN. Currently we just run terraform, nothing fancy. It's weird because other commits I've made worked, it's just my latest commit that ran into the issue.

We're using v0.24.4