Open omotoso78 opened 2 years ago
I'm also experiencing the same thing while using Atlantis image v0.19.8
with Terragrunt customization.
My repos.yaml configuration as below
repos:
- id: "/.*/"
workflow: terragrunt
apply_requirements: [approved,mergeable]
workflows:
terragrunt:
plan:
steps:
- env:
name: TERRAGRUNT_TFPATH
command: 'echo "terraform${ATLANTIS_TERRAFORM_VERSION}"'
- run: terragrunt plan -out=$PLANFILE
- run: terragrunt show -json $PLANFILE > $SHOWFILE
apply:
steps:
- env:
name: TERRAGRUNT_TFPATH
command: 'echo "terraform${ATLANTIS_TERRAFORM_VERSION}"'
- run: terragrunt apply $PLANFILE
I tried enabling logLevel: "debug"
and found the following error
{"level":"debug","ts":"2022-09-29T17:37:30.849Z","caller":"server/middleware.go:70","msg":"GET /jobs/7836d3a3-f208-4c4d-ac9e-37d1a06116a7/ws – respond HTTP 500","json":{}}
Try atlantis plan --verbose
It would help to know how you folks have deployed the app. I use the latest version and can see the logs within my eks cluster.
Have you got websockets enabled on any networking infrastructure that your Atlantis installation sits behind? For example, I had to add an annotation to my K8s Contour Ingress to allow websocket streaming to work (the /ws
on the end of the URL is for websockets)
I am facing a similar issue, the difference being that the logs appear at once altogether at once once the apply
/ plan
process is complete, i.e. they are not actually "streamed". atlantis
is running on a VM so I don't know what type of websocket
- related configuration can be done.
People who deploy from these modules haven't run into these issues as far as I know. Please consider these deployments.
Related issues
While I can see the live logs, it often happens that I need to refresh the page a few times before the web-socket connection succeeds and the logs start streaming.
@marceloboeira it seems like this may be reflected by the deployment of atlantis. I'm curious if there is a misconfiguration in the deployment, a limitation in the cloud deployment used, or something that can be mitigated by additional logic in atlantis. Or maybe a combination.
If we can do anything in the atlantis server, then please feel free to propose a pr if you find a way to reproduce and resolve the issue.
Maybe it's as simple as doing a retry in the frontend to connect to the websocket?
I've deployed atlantis into an EKS Cluster. I am also running into this issue with the websocket and getting 500 error codes when running in debugging mode.
I had this problem before. It was a permission issue. The good thing is that on Atlantis >= v0.27.0, this is explicitly logged, so you can double check that on atlantis stdout.
The trivial solution (just to test) is make atlantis a repo owner.
Also, this new version of atlantis shows every terraform log on it's HTTP website. It's not as comfortable as clicking on github, but it does the trick.
I am experiencing this as well in v0.27.1
Sometimes it works if you click on the link / job with the output but some other times it works only after a refresh, or it could even show a partial part of the output every time you refresh.
@dimisjim what do you see if you open that page with developer tools? in theory, that's most of the times because of the loadbalancer and websocker connection...
@marceloboeira
These are there always:
This shows when {some} / {sometimes all} of the content loads up:
xterm-4.9.0.js:24 Canvas2D: Multiple readback operations using getImageData are faster with the willReadFrequently attribute set to true. See: https://html.spec.whatwg.org/multipage/canvas.html#concept-canvas-will-read-frequently
This shows up when no content loads up:
93e23b46-4b37-429f-8959-a2d03b39d3db:66 WebSocket connection to 'wss://<ATLANTIS_URL>/jobs/93e23b46-4b37-429f-8959-a2d03b39d3db/ws' failed:
We are using a GCP load balancer.
I think it might be that you need to tweak your LB to properly forward the WebSocket connection to the Atlantis instance.
That was the case when I used Atlantis with NGINX / ALB, I had to make a few changes to allow sticky sessions, some specific config for NGINX to keep alive and Upgrade/Connection headers — WebSockets on NGINX.
You might have to figure out the equivalents for GCP — https://cloud.google.com/load-balancing/docs/https#websocket_support
It seems to be by default enabled, but you might want to review if the timeouts and such.
Overall, atlantis could use a much simpler polling-based log-stream, it would be easier to make it compatible everywhere, WS for this purpose is overkill.
@marceloboeira
Hmm based on the doc you linked:
The load balancer does not need any configuration to proxy WebSocket connections.
and Upgrade/Connection headers are also supported:
When the load balancer recognizes a WebSocket Upgrade request from an HTTP(S) client followed by a successful Upgrade response from the backend instance, the load balancer proxies bidirectional traffic for the duration of the current connection. If the backend instance does not return a successful Upgrade response, the load balancer closes the connection.
so it should be working out of the box, at least from the GCP Load balancing side. Maybe also the setup I am using based on: https://github.com/bschaatsbergen/terraform-gce-atlantis makes a difference in this regard? Can't tell.
The session affinity is set to none in GCP load balancing by default (I thought to modify this as per doc, this is the one we can manipulate anyway). Setting it to ClientIP and "Maglev" routing policy didn't make a difference 🤔
Thanks for the hints anyhow!
Same problem with my atlantis
We're using oauth2 proxy and haproxy ingress for the k8s ingress. What I found was that disabling oauth2 proxy security for /jobs
magically solved this..
There are no logs generated by atlantis which I can see; but without checking the code this leads me to think its behaving differently based on headers..
The additional headers I can see in use (when authentication is applied) to /jobs
(prefix) are:
next up I'll try to disable sending these/or some of these headers to atlantis and see if it works
also possibly/maybe the oauth2 filtering layer by the ingress doesn't see expected headers from the client also.. not sure honestly
Would be really great if atlantis just didn't insist on wss://
which are notoriously painful on k8s.
Re: https://github.com/runatlantis/atlantis/issues/2026
Community Note
Overview of the Issue
I am able to run the atlantis plan and atlantis apply, working fine. But, unable to see terraform live logs, when it is planning/applying.
The link provided in the "details" opens a blank screen.
Reproduction Steps
Atlantis install v.0.19.7. it is a local install using git enterprise user. No repo.yaml or atlantis.yaml is used. pull request is submitted from a branch. and works fine. But the log is not visible
Logs
Environment details
Additional Context