Closed gshif closed 5 years ago
litmus_run_master
script has been already checking the status of the following services by calling http://<gateway>:9999/healthz
rest api:
Need to add check for the rest of the components to the litmus_run_master
script, i.e.:
curl -GET http://localhost:8098/healthz
{
"status": "healthy",
"checks": [
{
"name": "internal",
"status": "healthy"
}
]
}
curl -GET http://localhost:8093/healthz
{
"status": "healthy",
"checks": [
{
"name": "internal",
"status": "healthy"
},
{
"name": "storage",
"status": "healthy",
"checks": [
{
"name": "partition 0",
"status": "healthy",
"checks": [
{
"status": "healthy",
"message": "connection state to storage-0.storage:8082 is READY"
}
]
}
]
}
]
}
tasks
and etcd-task
services health check is not implemented yet. Need to ask the tasks team what is the correct url.
tasks
service. Has two parts:
curl -vv -GET http://localhost:8276/health
. The example output (when task service is not healthy):
GET /health HTTP/1.1 Host: localhost:8276 User-Agent: curl/7.52.1 Accept: /
< HTTP/1.1 503 Service Unavailable < Date: Fri, 09 Nov 2018 21:36:36 GMT < Content-Length: 225 < Content-Type: text/plain; charset=utf-8 < { "status": "unhealthy", "checks": [ { "name": "internal", "status": "healthy" }, { "name": "etcdStore", "status": "unhealthy", "message": "underlying session closed" } ] }
or
{
"status": "healthy",
"checks": [
{
"name": "internal",
"status": "healthy"
},
{
"name": "etcdStore",
"status": "healthy"
}
]
}
Tasks service is using two different endpoints to check for readiness
and health
:
http://
{
"status": "healthy",
"checks": [
{
"name": "internal",
"status": "healthy"
},
{
"name": "etcdStore",
"status": "healthy"
}
]
}
http://
{
"status": "healthy",
"checks": [
{
"name": "internal",
"status": "healthy"
},
{
"name": "Kafka producer",
"status": "healthy"
},
{
"name": "Query Service",
"status": "healthy"
}
]
}
Readiness: Tasks service is not ready to accept requests if ready
endpoint returns unhealthy
status.
Liveness: Tasks pod(s) will be restarted if health
endpoint returns unhealthy
status.
I am using health
endpoint to check the health of the tasks, i.e. if tasks is not connected to etcd-tasks, then task cannot really be created, but, to the best of my knowledge, if queryd or kafka is unhealthy, tasks can still queue the task(s) to be executed.
Once the platform(influxdb)
code is merged into idpe
, the smoke tests would have to be revised due to changes in rest api.
Once merge happens, the separate issue would be created to track the work required to make smoke test work again.
There is no E2E test(s) for 2.0. Currently, only integration tests for
gateway
andetcd
components ( to create/edit/delete/show users/organizations/buckets) exist. Need to create E2E test(s) that touches all of the components (or almost all the components with the exception of the ones that are not yet working - such astasks
)