runatlantis / atlantis

Terraform Pull Request Automation
https://www.runatlantis.io
Other
7.82k stars 1.06k forks source link

Atlantis terraform live logs are not showing #2542

Open omotoso78 opened 2 years ago

omotoso78 commented 2 years ago

Community Note


Overview of the Issue

I am able to run the atlantis plan and atlantis apply, working fine. But, unable to see terraform live logs, when it is planning/applying.

The link provided in the "details" opens a blank screen.

Reproduction Steps

Atlantis install v.0.19.7. it is a local install using git enterprise user. No repo.yaml or atlantis.yaml is used. pull request is submitted from a branch. and works fine. But the log is not visible

Logs

Environment details

Additional Context

prastamaha commented 2 years ago

I'm also experiencing the same thing while using Atlantis image v0.19.8 with Terragrunt customization.

My repos.yaml configuration as below

repos:
- id: "/.*/"
  workflow: terragrunt
  apply_requirements: [approved,mergeable]
workflows:
  terragrunt:
    plan:
      steps:
      - env:
          name: TERRAGRUNT_TFPATH
          command: 'echo "terraform${ATLANTIS_TERRAFORM_VERSION}"'
      - run: terragrunt plan -out=$PLANFILE
      - run: terragrunt show -json $PLANFILE > $SHOWFILE
    apply:
      steps:
      - env:
          name: TERRAGRUNT_TFPATH
          command: 'echo "terraform${ATLANTIS_TERRAFORM_VERSION}"'
      - run: terragrunt apply $PLANFILE

I tried enabling logLevel: "debug" and found the following error

{"level":"debug","ts":"2022-09-29T17:37:30.849Z","caller":"server/middleware.go:70","msg":"GET /jobs/7836d3a3-f208-4c4d-ac9e-37d1a06116a7/ws – respond HTTP 500","json":{}}
nitrocode commented 2 years ago

Try atlantis plan --verbose

It would help to know how you folks have deployed the app. I use the latest version and can see the logs within my eks cluster.

andy-paine-numan commented 1 year ago

Have you got websockets enabled on any networking infrastructure that your Atlantis installation sits behind? For example, I had to add an annotation to my K8s Contour Ingress to allow websocket streaming to work (the /ws on the end of the URL is for websockets)

pantelis-karamolegkos commented 1 year ago

I am facing a similar issue, the difference being that the logs appear at once altogether at once once the apply / plan process is complete, i.e. they are not actually "streamed". atlantis is running on a VM so I don't know what type of websocket - related configuration can be done.

nitrocode commented 1 year ago

People who deploy from these modules haven't run into these issues as far as I know. Please consider these deployments.

Related issues

miguelaferreira commented 1 year ago

While I can see the live logs, it often happens that I need to refresh the page a few times before the web-socket connection succeeds and the logs start streaming.

Screenshot of browser console when web-socket connection fails ![image](https://github.com/runatlantis/atlantis/assets/4670993/355b2254-521c-488c-b4ae-53771b489408)
nitrocode commented 1 year ago

@marceloboeira it seems like this may be reflected by the deployment of atlantis. I'm curious if there is a misconfiguration in the deployment, a limitation in the cloud deployment used, or something that can be mitigated by additional logic in atlantis. Or maybe a combination.

If we can do anything in the atlantis server, then please feel free to propose a pr if you find a way to reproduce and resolve the issue.

Maybe it's as simple as doing a retry in the frontend to connect to the websocket?

cloudn8ve commented 1 year ago

I've deployed atlantis into an EKS Cluster. I am also running into this issue with the websocket and getting 500 error codes when running in debugging mode.

marcosdiez commented 10 months ago

I had this problem before. It was a permission issue. The good thing is that on Atlantis >= v0.27.0, this is explicitly logged, so you can double check that on atlantis stdout.

The trivial solution (just to test) is make atlantis a repo owner.

Also, this new version of atlantis shows every terraform log on it's HTTP website. It's not as comfortable as clicking on github, but it does the trick.

dimisjim commented 8 months ago

I am experiencing this as well in v0.27.1

Sometimes it works if you click on the link / job with the output but some other times it works only after a refresh, or it could even show a partial part of the output every time you refresh.

marceloboeira commented 8 months ago

@dimisjim what do you see if you open that page with developer tools? in theory, that's most of the times because of the loadbalancer and websocker connection...

dimisjim commented 8 months ago

@marceloboeira

These are there always:

image

This shows when {some} / {sometimes all} of the content loads up:

xterm-4.9.0.js:24 Canvas2D: Multiple readback operations using getImageData are faster with the willReadFrequently attribute set to true. See: https://html.spec.whatwg.org/multipage/canvas.html#concept-canvas-will-read-frequently

This shows up when no content loads up:

93e23b46-4b37-429f-8959-a2d03b39d3db:66 WebSocket connection to 'wss://<ATLANTIS_URL>/jobs/93e23b46-4b37-429f-8959-a2d03b39d3db/ws' failed: 

We are using a GCP load balancer.

marceloboeira commented 8 months ago

I think it might be that you need to tweak your LB to properly forward the WebSocket connection to the Atlantis instance.

That was the case when I used Atlantis with NGINX / ALB, I had to make a few changes to allow sticky sessions, some specific config for NGINX to keep alive and Upgrade/Connection headers — WebSockets on NGINX.

You might have to figure out the equivalents for GCP — https://cloud.google.com/load-balancing/docs/https#websocket_support

It seems to be by default enabled, but you might want to review if the timeouts and such.

Overall, atlantis could use a much simpler polling-based log-stream, it would be easier to make it compatible everywhere, WS for this purpose is overkill.

dimisjim commented 8 months ago

@marceloboeira

Hmm based on the doc you linked:

The load balancer does not need any configuration to proxy WebSocket connections.

and Upgrade/Connection headers are also supported:

When the load balancer recognizes a WebSocket Upgrade request from an HTTP(S) client followed by a successful Upgrade response from the backend instance, the load balancer proxies bidirectional traffic for the duration of the current connection. If the backend instance does not return a successful Upgrade response, the load balancer closes the connection.

so it should be working out of the box, at least from the GCP Load balancing side. Maybe also the setup I am using based on: https://github.com/bschaatsbergen/terraform-gce-atlantis makes a difference in this regard? Can't tell.

The session affinity is set to none in GCP load balancing by default (I thought to modify this as per doc, this is the one we can manipulate anyway). Setting it to ClientIP and "Maglev" routing policy didn't make a difference 🤔

Thanks for the hints anyhow!

starkers commented 7 months ago

Same problem with my atlantis

We're using oauth2 proxy and haproxy ingress for the k8s ingress. What I found was that disabling oauth2 proxy security for /jobs magically solved this..

There are no logs generated by atlantis which I can see; but without checking the code this leads me to think its behaving differently based on headers..

The additional headers I can see in use (when authentication is applied) to /jobs (prefix) are:

next up I'll try to disable sending these/or some of these headers to atlantis and see if it works

also possibly/maybe the oauth2 filtering layer by the ingress doesn't see expected headers from the client also.. not sure honestly

image

Would be really great if atlantis just didn't insist on wss:// which are notoriously painful on k8s. Re: https://github.com/runatlantis/atlantis/issues/2026