Closed mrjones-plip closed 1 year ago
after step 5, both http requests and curl requests return HTML and a 502
or 503
. I would expect HAProxy's port 5984
to return JSON, even if it's serving a 5xx
error:
➜ deleteme git:(7852-cht-4.x-docker-helper) ✗ curl -s http://medic:password@172.27.0.3:5984/medic | jq
{
"db_name": "medic",
"purge_seq": "0-g1AAAAH5eJyFk-SNIP-X4s_qEk",
"update_seq": "115-g1AAAAH5eJ-SNIP-yF0LENwjQI_K0wLfgO3P6i8",
"sizes": {
"file": 1264034,
"external": 1486394,
"active": 798908
},
"other": {
"data_size": 1486394
},
"doc_del_count": 6,
"doc_count": 38,
"disk_size": 1264034,
"disk_format_version": 7,
"data_size": 798908,
"compact_running": false,
"cluster": {
"q": 12,
"n": 1,
"w": 1,
"r": 1
},
"instance_start_time": "0"
}
docker stop deleteme_couchdb.2_1
deleteme_couchdb.2_1
$ curl -s http://medic:password@172.27.0.3:5984/medic | jq
parse error: Invalid numeric literal at line 1, column 20
$ curl -s http://medic:password@172.27.0.3:5984/medic
<html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html>
$ curl -k https://localhost/
<html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.19.6</center>
</body>
</html
However, after step 6, only HAProxy is working again, CHT is not:
curl -s http://medic:password@172.27.0.3:5984/medic | jq
{
"db_name": "medic",
"purge_seq": "0-g1AAAAH5eJzLY-SNIP-qEk",
"update_seq": "115-g1AAAAH5eJy-SNIP-P6i8",
"sizes": {
"file": 1264034,
"external": 1486394,
"active": 798908
},
"other": {
"data_size": 1486394
},
"doc_del_count": 6,
"doc_count": 38,
"disk_size": 1264034,
"disk_format_version": 7,
"data_size": 798908,
"compact_running": false,
"cluster": {
"q": 12,
"n": 1,
"w": 1,
"r": 1
},
"instance_start_time": "0"
}
curl -k https://localhost/
<html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.19.6</center>
</body>
</html>
When we look at the API and Sentinel containers, we see errors (below).
The work around was to manually restart API and Sentinel and then CHT was responsive again:
docker restart deleteme_api_1 deleteme_sentinel_1
deleteme_api_1
deleteme_sentinel_1
docker logs deleteme_api_1
2022-11-02 22:03:28 INFO: xforms updated successfully
REQ 44c12316-befb-483f-9fa9-fd43d80ba47f 172.27.0.1 - GET /medic/_changes?feed=longpoll&heartbeat=10000&since=115-g1AAAAH5eJyF0LENwjAUBFALIkACCgpmQKJAAgaAFSDxAN9JEX1FUFEzBStA4iWYIktkBpLPhSCqxFRX-Ml3dqKUmsT9SM3C8yWMI7P_5mqd4KhHysy1Dpj6rWj7QTXRZmltxjRsdZvGgVizE6m6XFNaEzG40__TCxIYtjZl8ly9IJm5ipRMA1cvSHXy1FHrG9PY0QzkAyEeeBLTyNEOlAIhXiJP9wKgEghx0Drvuve3AUNzxN3agmnq3oChBQI_K0wLfgO3P6i8&limit=25 HTTP/1.0
REQ fb164bf8-04b6-4852-9b88-06216e2f2993 172.27.0.1 - GET /medic-user-medic-meta/_changes?include_docs=true&feed=longpoll&heartbeat=10000&since=13-g1AAAAH5eJyF0LERgkAQBdATDU0MbMLAGbUAbUFhC7iFgLlhNDK2CltQ2CasgiaoQVg_Yih30Q_uzf1_Vxhj5vk0M4v0ck3zjA-_XG8KHEXW8JIocXbyF-2-qCfEK5FqzG0HByK8V-3G3FDaE2XcGQd6QRJ2ImWgF6Tim2ob6AXpzjNzIrr7m4FiIMQTT_K3A5VAiLfqy9nIswCoBUIciergBgytEQ-RJrgBQxsEflZh3QeAu6hW&limit=25 HTTP/1.0
REQ 545be762-0993-4957-8e2e-1f7b4d233548 ::ffff:172.27.0.8 - GET /setup/poll HTTP/1.1
RES 545be762-0993-4957-8e2e-1f7b4d233548 ::ffff:172.27.0.8 - GET /setup/poll HTTP/1.1 200 108 13.054 ms
2022-11-02 22:04:28 DEBUG: Checking for a configured outgoing message service
2022-11-02 22:05:28 DEBUG: Checking for a configured outgoing message service
2022-11-02 22:06:28 DEBUG: Checking for a configured outgoing message service
2022-11-02 22:07:28 DEBUG: Checking for a configured outgoing message service
REQ bc09ad8b-4600-4b63-9981-f52328e4f403 172.27.0.1 - GET /medic/_design/medic-client/_view/data_records_by_type?group=true HTTP/1.0
REQ cf3edf66-d012-44cf-a530-a1b9c893f2ff 172.27.0.1 - GET /medic-user-medic-meta/_design/medic-user/_view/read?group=true HTTP/1.0
RES bc09ad8b-4600-4b63-9981-f52328e4f403 172.27.0.1 - GET /medic/_design/medic-client/_view/data_records_by_type?group=true HTTP/1.0 200 - 26.449 ms
RES cf3edf66-d012-44cf-a530-a1b9c893f2ff 172.27.0.1 - GET /medic-user-medic-meta/_design/medic-user/_view/read?group=true HTTP/1.0 200 - 28.299 ms
2022-11-02 22:08:28 DEBUG: Checking for a configured outgoing message service
RES fb164bf8-04b6-4852-9b88-06216e2f2993 172.27.0.1 - GET /medic-user-medic-meta/_changes?include_docs=true&feed=longpoll&heartbeat=10000&since=13-g1AAAAH5eJyF0LERgkAQBdATDU0MbMLAGbUAbUFhC7iFgLlhNDK2CltQ2CasgiaoQVg_Yih30Q_uzf1_Vxhj5vk0M4v0ck3zjA-_XG8KHEXW8JIocXbyF-2-qCfEK5FqzG0HByK8V-3G3FDaE2XcGQd6QRJ2ImWgF6Tim2ob6AXpzjNzIrr7m4FiIMQTT_K3A5VAiLfqy9nIswCoBUIciergBgytEQ-RJrgBQxsEflZh3QeAu6hW&limit=25 HTTP/1.0 200 - 10262.491 ms
2022-11-02 22:08:45 ERROR: Error watching changes, restarting: FetchError: invalid json response body at http://haproxy:5984/medic/_changes?feed=longpoll&heartbeat=10000&since=115-g1AAAAOReJyV0sFNwzAUgGGLVBQJOHBgBiQOyHFsJz3BCtB6AD_bUhQVOHFmClaA1kswRZfoDLTmOUEKByuST09K5E_-X7ImhFy0hSVX5vXNtBYe_uZducZXJ5rAtVKrri108YwP5tRoEI1MHUgyVc9ERMGt99sozXvJUVqVzKbOJCU2SIh4uA_hOEqCgbZNnTozkRaRAHiv5b86UVsrRG4dIivovN9EaTZIUmoHKWmyDpEtvIdwiNJpL0lWl6B5bh0ix5cZeVLqI1rnvQW2EcAXmX3ILJHB8YXLj9pZrzVlU0lwmY3IbJDB8RPC99i5MFI6ntrYVCcyB2RwPCq1G-_GqXG01vmluLAdjk_v91G7HEo1A-A0vxQXtseBf1qI2s3wRcEwXqU01v0CQ9kbEg&limit=25 reason: Unexpected token < in JSON at position 0
at /api/node_modules/node-fetch/lib/index.js:272:32
at processTicksAndRejections (node:internal/process/task_queues:96:5) {
message: 'invalid json response body at http://haproxy:5984/medic/_changes?feed=longpoll&heartbeat=10000&since=115-g1AAAAOReJyV0sFNwzAUgGGLVBQJOHBgBiQOyHFsJz3BCtB6AD_bUhQVOHFmClaA1kswRZfoDLTmOUEKByuST09K5E_-X7ImhFy0hSVX5vXNtBYe_uZducZXJ5rAtVKrri108YwP5tRoEI1MHUgyVc9ERMGt99sozXvJUVqVzKbOJCU2SIh4uA_hOEqCgbZNnTozkRaRAHiv5b86UVsrRG4dIivovN9EaTZIUmoHKWmyDpEtvIdwiNJpL0lWl6B5bh0ix5cZeVLqI1rnvQW2EcAXmX3ILJHB8YXLj9pZrzVlU0lwmY3IbJDB8RPC99i5MFI6ntrYVCcyB2RwPCq1G-_GqXG01vmluLAdjk_v91G7HEo1A-A0vxQXtseBf1qI2s3wRcEwXqU01v0CQ9kbEg&limit=25 reason: Unexpected token < in JSON at position 0',
type: 'invalid-json',
[stack]: 'FetchError: invalid json response body at http://haproxy:5984/medic/_changes?feed=longpoll&heartbeat=10000&since=115-g1AAAAOReJyV0sFNwzAUgGGLVBQJOHBgBiQOyHFsJz3BCtB6AD_bUhQVOHFmClaA1kswRZfoDLTmOUEKByuST09K5E_-X7ImhFy0hSVX5vXNtBYe_uZducZXJ5rAtVKrri108YwP5tRoEI1MHUgyVc9ERMGt99sozXvJUVqVzKbOJCU2SIh4uA_hOEqCgbZNnTozkRaRAHiv5b86UVsrRG4dIivovN9EaTZIUmoHKWmyDpEtvIdwiNJpL0lWl6B5bh0ix5cZeVLqI1rnvQW2EcAXmX3ILJHB8YXLj9pZrzVlU0lwmY3IbJDB8RPC99i5MFI6ntrYVCcyB2RwPCq1G-_GqXG01vmluLAdjk_v91G7HEo1A-A0vxQXtseBf1qI2s3wRcEwXqU01v0CQ9kbEg&limit=25 reason: Unexpected token < in JSON at position 0\n' +
' at /api/node_modules/node-fetch/lib/index.js:272:32\n' +
' at processTicksAndRejections (node:internal/process/task_queues:96:5)',
name: 'FetchError'
}
docker logs deleteme_sentinel_1
2022-11-02 22:08:35 INFO: Task dueTasks completed
2022-11-02 22:08:35 INFO: Background cleanup batch: 114 -> 114 (0)
2022-11-02 22:08:35 INFO: Task backgroundCleanup completed
2022-11-02 22:09:54 ERROR: Error watching changes, restarting: FetchError: invalid json response body at http://haproxy:5984/medic/_changes?feed=longpoll&heartbeat=10000&since=115-g1AAAAOReJyV0sFNwzAUgGGLVBQJOHBgBiQOyHFsJz3BCtB6AD_bUhQVOHFmClaA1kswRZfoDLTmOUEKByuST09K5E_-X7ImhFy0hSVX5vXNtBYe_uZducZXJ5rAtVKrri108YwP5tRoEI1MHUgyVc9ERMGt99sozXvJUVqVzKbOJCU2SIh4uA_hOEqCgbZNnTozkRaRAHiv5b86UVsrRG4dIivovN9EaTZIUmoHKWmyDpEtvIdwiNJpL0lWl6B5bh0ix5cZeVLqI1rnvQW2EcAXmX3ILJHB8YXLj9pZrzVlU0lwmY3IbJDB8RPC99i5MFI6ntrYVCcyB2RwPCq1G-_GqXG01vmluLAdjk_v91G7HEo1A-A0vxQXtseBf1qI2s3wRcEwXqU01v0CQ9kbEg&limit=25 reason: Unexpected token < in JSON at position 0
at /sentinel/node_modules/node-fetch/lib/index.js:272:32
at processTicksAndRejections (node:internal/process/task_queues:96:5) {
message: 'invalid json response body at http://haproxy:5984/medic/_changes?feed=longpoll&heartbeat=10000&since=115-g1AAAAOReJyV0sFNwzAUgGGLVBQJOHBgBiQOyHFsJz3BCtB6AD_bUhQVOHFmClaA1kswRZfoDLTmOUEKByuST09K5E_-X7ImhFy0hSVX5vXNtBYe_uZducZXJ5rAtVKrri108YwP5tRoEI1MHUgyVc9ERMGt99sozXvJUVqVzKbOJCU2SIh4uA_hOEqCgbZNnTozkRaRAHiv5b86UVsrRG4dIivovN9EaTZIUmoHKWmyDpEtvIdwiNJpL0lWl6B5bh0ix5cZeVLqI1rnvQW2EcAXmX3ILJHB8YXLj9pZrzVlU0lwmY3IbJDB8RPC99i5MFI6ntrYVCcyB2RwPCq1G-_GqXG01vmluLAdjk_v91G7HEo1A-A0vxQXtseBf1qI2s3wRcEwXqU01v0CQ9kbEg&limit=25 reason: Unexpected token < in JSON at position 0',
type: 'invalid-json',
[stack]: 'FetchError: invalid json response body at http://haproxy:5984/medic/_changes?feed=longpoll&heartbeat=10000&since=115-g1AAAAOReJyV0sFNwzAUgGGLVBQJOHBgBiQOyHFsJz3BCtB6AD_bUhQVOHFmClaA1kswRZfoDLTmOUEKByuST09K5E_-X7ImhFy0hSVX5vXNtBYe_uZducZXJ5rAtVKrri108YwP5tRoEI1MHUgyVc9ERMGt99sozXvJUVqVzKbOJCU2SIh4uA_hOEqCgbZNnTozkRaRAHiv5b86UVsrRG4dIivovN9EaTZIUmoHKWmyDpEtvIdwiNJpL0lWl6B5bh0ix5cZeVLqI1rnvQW2EcAXmX3ILJHB8YXLj9pZrzVlU0lwmY3IbJDB8RPC99i5MFI6ntrYVCcyB2RwPCq1G-_GqXG01vmluLAdjk_v91G7HEo1A-A0vxQXtseBf1qI2s3wRcEwXqU01v0CQ9kbEg&limit=25 reason: Unexpected token < in JSON at position 0\n' +
' at /sentinel/node_modules/node-fetch/lib/index.js:272:32\n' +
' at processTicksAndRejections (node:internal/process/task_queues:96:5)',
name: 'FetchError'
}
I made this work by manually editing the HAProxy config in my live haproxy
container. First I edited the default_frontend.cfg
and added this stanza:
http-errors json_503_errors
errorfile 503 /usr/local/etc/haproxy/errors/503.json
And then at the end of the frontend http-in
config in that same file, I added this line:
errorfiles json_503_errors
I created the JSON error file in /usr/local/etc/haproxy/errors/503.json
with this contents:
HTTP/1.0 503 Service Unavailable
Cache-Control: no-cache
Connection: close
Content-Type: text/html
{
"code": 503,
"error": "temporarily_unavailable",
"error_description": "Sorry, the service is temporarily unavailable"
}
I then restarted the haproxy
container. Now, when I pulled a node down, I get a much nicer error on the CLI for HAProxy:
curl -s http://medic:password@172.27.0.3:5984/medic | jq
{
"error": "nodedown",
"reason": "progress not possible"
}
And API doesn't die either. It does log an ERROR
though:
2022-11-02 23:24:09 ERROR: Server error: StatusCodeError: 503 - {"code":503,"error":"temporarily_unavailable","error_description":"Sorry, the service is temporarily unavailable"}
at new StatusCodeError (/api/node_modules/request-promise-core/lib/errors.js:32:15)
at Request.plumbing.callback (/api/node_modules/request-promise-core/lib/plumbing.js:104:33)
at Request.RP$callback [as _callback] (/api/node_modules/request-promise-core/lib/plumbing.js:46:31)
at Request.self.callback (/api/node_modules/request/request.js:185:22)
at Request.emit (node:events:513:28)
at Request.<anonymous> (/api/node_modules/request/request.js:1154:10)
at Request.emit (node:events:513:28)
at IncomingMessage.<anonymous> (/api/node_modules/request/request.js:1076:12)
at Object.onceWrapper (node:events:627:28)
at IncomingMessage.emit (node:events:525:35) {
name: 'StatusCodeError',
statusCode: 503,
message: '503 - {"code":503,"error":"temporarily_unavailable","error_description":"Sorry, the service is temporarily unavailable"}',
error: {
code: 503,
error: 'temporarily_unavailable',
error_description: 'Sorry, the service is temporarily unavailable'
},
[stack]: 'StatusCodeError: 503 - {"code":503,"error":"temporarily_unavailable","error_description":"Sorry, the service is temporarily unavailable"}\n' +
' at new StatusCodeError (/api/node_modules/request-promise-core/lib/errors.js:32:15)\n' +
' at Request.plumbing.callback (/api/node_modules/request-promise-core/lib/plumbing.js:104:33)\n' +
' at Request.RP$callback [as _callback] (/api/node_modules/request-promise-core/lib/plumbing.js:46:31)\n' +
' at Request.self.callback (/api/node_modules/request/request.js:185:22)\n' +
' at Request.emit (node:events:513:28)\n' +
' at Request.<anonymous> (/api/node_modules/request/request.js:1154:10)\n' +
' at Request.emit (node:events:513:28)\n' +
' at IncomingMessage.<anonymous> (/api/node_modules/request/request.js:1076:12)\n' +
' at Object.onceWrapper (node:events:627:28)\n' +
' at IncomingMessage.emit (node:events:525:35)'
}
The same for sentinel - it doesn't die, but does log an error:
2022-11-02 23:20:42 ERROR: Task backgroundCleanup completed with error: {
code: 503,
error: 'temporarily_unavailable',
error_description: 'Sorry, the service is temporarily unavailable',
status: 503,
name: 'temporarily_unavailable',
message: undefined,
docId: '_local/background-seq'
}
Noteworthy, the front end just shows a full screen spinner for an online admin users, with no indication there's an error. Maybe the old full page <center><h1>502 Bad Gateway</h1></center>
output was better?
Starting up the couchdb node fixes everything, no need to restart API or Sentinel b/c they never died in the first place.
With recent PR/ticket merged over here - this is good to go - AT passes
User: curl/browser Site: local docker Platform: NA Test Steps: Steps for test
1. get your two docker compose files: ``` wget https://staging.dev.medicmobile.org/_couch/builds/medic:medic:master/docker-compose/cht-couchdb-clustered.yml wget https://staging.dev.medicmobile.org/_couch/builds/medic:medic:master/docker-compose/cht-couchdb-clustered.yml ``` 3. Start a cluster ``` COUCHDB_SERVERS="couchdb.1,couchdb.2,couchdb.3" CHT_COMPOSE_PROJECT_NAME=test5 COUCHDB_SECRET=test5 DOCKER_CONFIG_PATH=./ COUCHDB_DATA=./couchdb CHT_COMPOSE_PATH=./ COUCHDB_USER=medic COUCHDB_PASSWORD=password docker-compose -f cht-core.yml -f cht-couchdb-clustered.yml up ``` 1. ensure CHT is up in browser and with `curl` 1. browse to https://localhost/ 2. find IP of haproxy with `docker inspect -f '{{.Name}} - {{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' $(docker ps -q)` and run the command and see JSON of healthy medic DB: `curl -s http://medic:password@172.27.0.3:5984/medic | jq` 1. check for a container names with `docker ps --format '{{.Names}}'|grep couchdb` and then stop a couch node: `docker stop deleteme_couchdb.2_1` 1. ensure CHT is DOWN in browser and with `curl` 1. browse to https://localhost/ 2. find IP of haproxy (`172.27.0.3` in this case) and run the command and see JSON of SAD medic DB: `curl -s http://medic:password@172.27.0.3:5984/medic | jq` 1. start couch node again: `docker start deleteme_couchdb.2_1` Expected Result: CHT should resume correctly. check in browser and with `curl` 1. browse to https://localhost/ 2. find IP of haproxy (`172.27.0.3` in this case) and run the command and see JSON of healthy medic DB: `curl -s http://medic:password@172.27.0.3:5984/medic | jq`