rarebreed / khadga

Web application to gather real time video and text data over WebRTC and websockets
Other
3 stars 0 forks source link

Create health check on khadga for the GKE Ingress load balancer #44

Open rarebreed opened 4 years ago

rarebreed commented 4 years ago

I think what might be causing the websocket to close so fast is that khadga is getting hit very frequently on the / endpoint. After some searching, I found out that if you use an Ingress Load Balancer, that your container service needs to implement a health check endpoint that returns a 200 Ok.

Currently, khadga directly serves up content when the / is called. I would prefer not to make the user do something like go to khadga.app/start. One option is to have the load balancer do that and redirect / to /start. But then I am not sure how that will affect the health check. I dont want the health check to hit the / endpoint every second or so to do a health check, and pull down all the files (that will cost money in bandwidth).

It appears that there is a way to configure a custom health check endpoint. So, lets create a new endpoint just for health checking.

rarebreed commented 4 years ago

This is partially done. I created in the deployment yaml config file a new path for where to send health checks, and added a new health check on khadga. While I see some health checks hitting the /health endpoint, I see another health check still hitting the / endpoint.

[
 {
   "textPayload": "[2020-02-27T04:03:56Z INFO  warp::filters::log] 10.8.0.1:35090 \"GET / HTTP/1.1\" 200 \"-\" \"GoogleHC/1.0\" 222.08µs\n",
   "insertId": "sl2iqnfxul3xo",
   "resource": {
     "type": "container",
     "labels": {
       "container_name": "khadga",
       "namespace_id": "default",
       "instance_id": "2726422611480887126",
       "zone": "us-central1-a",
       "pod_id": "backend-deployment-6d484cd47f-shmd6",
       "project_id": "khadga-dev",
       "cluster_name": "standard-cluster-1"
     }
   },
   "timestamp": "2020-02-27T04:03:56.097953311Z",
   "severity": "ERROR",
   "labels": {
     "compute.googleapis.com/resource_name": "gke-standard-cluster-1-default-pool-6cf961be-ns5r",
     "container.googleapis.com/stream": "stderr",
     "container.googleapis.com/pod_name": "backend-deployment-6d484cd47f-shmd6",
     "container.googleapis.com/namespace_name": "default"
   },
   "logName": "projects/khadga-dev/logs/khadga",
   "receiveTimestamp": "2020-02-27T04:04:00.435955411Z"
 },
 {
   "textPayload": "[2020-02-27T04:03:58Z INFO  warp::filters::log] 10.8.0.1:55540 \"GET /health HTTP/1.1\" 200 \"-\" \"kube-probe/1.13+\" 13.769µs\n",
   "insertId": "sl2iqnfxul3xp",
   "resource": {
     "type": "container",
     "labels": {
       "instance_id": "2726422611480887126",
       "pod_id": "backend-deployment-6d484cd47f-shmd6",
       "zone": "us-central1-a",
       "project_id": "khadga-dev",
       "cluster_name": "standard-cluster-1",
       "container_name": "khadga",
       "namespace_id": "default"
     }
   }
]

What's really weird is that it seems like the logs indicate an error, and yet, the health of my pod is Ok, and it returns a 200 Ok for both the / and /health endpoints.

Need to read up more on how health checks work.