salopensource / sal

Modular reporting for Endpoints
Apache License 2.0
214 stars 63 forks source link

Intermittent 504 after login #337

Closed ewancolyer closed 4 years ago

ewancolyer commented 5 years ago

Describe the bug After login, just after 12PM for the past 2 days we have been getting a 504 error after hitting sign in with valid credentials (if you are already signed in you get the 504 when you reload the page).

Yesterday would have been the 1st time that 24hours of the pod being up with any macs in sal, we are going to restart the pod later today to see if the issue happens later in the day to establish if it is happening 24 hours after the pod has been up.

All egress traffic is blocked apart from ports 53, 3306, when this happened yesterday we it contacted this ip 159.65.250.130 straight after we disabled the egress and the issue goes away

When this happened today, these ip's were contacted 34.228.211.243 52.2.186.244 34.228.211.243

Server (please complete the following information):

Client (please complete the following information):

Additional context These logs have been happening since we set the cluster up and have been seeing them every 5 mins:


2019-09-12 12:40:22,377 INFO exited: servermaint (exit status 0; expected)
2019-09-12 12:40:23,381 INFO spawned: 'servermaint' with pid 2557
2019-09-12 12:40:24,383 INFO success: servermaint entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2019-09-12 12:40:46,314 INFO exited: searchmaint (exit status 0; expected)
2019-09-12 12:40:47,318 INFO spawned: 'searchmaint' with pid 2560
2019-09-12 12:40:48,319 INFO success: searchmaint entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2019-09-12 12:45:25,022 INFO exited: servermaint (exit status 0; expected)
2019-09-12 12:45:26,026 INFO spawned: 'servermaint' with pid 2566
2019-09-12 12:45:27,027 INFO success: servermaint entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2019-09-12 12:45:49,685 INFO exited: searchmaint (exit status 0; expected)
2019-09-12 12:45:50,689 INFO spawned: 'searchmaint' with pid 2569
2019-09-12 12:45:51,691 INFO success: searchmaint entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2019-09-12 12:50:27,669 INFO exited: servermaint (exit status 0; expected)
2019-09-12 12:50:28,673 INFO spawned: 'servermaint' with pid 2575
2019-09-12 12:50:29,372 INFO success: servermaint entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2019-09-12 12:50:52,313 INFO exited: searchmaint (exit status 0; expected)
2019-09-12 12:50:53,316 INFO spawned: 'searchmaint' with pid 2578
2019-09-12 12:50:54,317 INFO success: searchmaint entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2019-09-12 12:55:30,234 INFO exited: servermaint (exit status 0; expected)
2019-09-12 12:55:31,237 INFO spawned: 'servermaint' with pid 2584
2019-09-12 12:55:32,239 INFO success: servermaint entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2019-09-12 12:55:55,000 INFO exited: searchmaint (exit status 0; expected)
2019-09-12 12:55:56,004 INFO spawned: 'searchmaint' with pid 2587
2019-09-12 12:55:57,007 INFO success: searchmaint entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)```
grahamgilbert commented 5 years ago

First one is version.salopensource.com. Not sure what the others are without looking.

grahamgilbert commented 5 years ago

Can you run in debug mode please? That will give us the stack trace.

ewancolyer commented 5 years ago

To note, we don't have saml set up at the moment are are using local credentials.

ewancolyer commented 4 years ago

Here are the logs:

This is whilst the ingress has been restricted for 24hrs+ and gets spat out a couple of times a minute


 insertId: "gh0t51wam8g18cb7w"  
 labels: {
  k8s-pod/app: "sal"   
  k8s-pod/controller-revision-hash: "sal-7f977b4cd8"   
  k8s-pod/statefulset_kubernetes_io/pod-name: "sal-0"   
 }
 logName: "projects/insert_project_here_logs/stderr"  
 receiveTimestamp: "2019-09-26T08:26:46.575043789Z"  
 resource: {
  labels: {
   cluster_name: "production"    
   container_name: "sal"    
   location: "europe-west1"    
   namespace_name: "sal"    
   pod_name: "sal-0"    
   project_id: "insert_project_here_logs"    
  }
  type: "k8s_container"   
 }
 severity: "ERROR"  
 textPayload: "[26/Sep/2019 09:26:43] DEBUG [server.non_ui_views:287] HTTPSConnectionPool(host='version.salopensource.com', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.VerifiedHTTPSConnection object at 0x7ff7a9c35b00>, 'Connection to version.salopensource.com timed out. (connect timeout=0.5)'))
"  
 timestamp: "2019-09-26T08:26:43.760187381Z"  
}```
ewancolyer commented 4 years ago

This seems to have been resolved for me at some point down the line.