tilezen / tapalcatl

Tapalcatl is a "metatile server", which attempts to serve individual tiles extracted from an archive in storage.
Other
14 stars 3 forks source link

Improve the healthcheck #26

Closed iandees closed 7 years ago

iandees commented 7 years ago

This improves the healthcheck for Tapalcatl to check the storage mechanism for reachability before responding with a confirmation that all is well.

iandees commented 7 years ago

@rmarianski @zerebubuth I made some changes here to address your comments. Can you take a peek again?

It seems to work as expected now.

With a non-existent S3 key:

$ tapalcatl_server -listen :8000 \
    -healthcheck /_healthcheck \
    -poolentrysize 10 \
    -poolentrysize 4096 \
    -handler '{"aws":{"region":"us-east-1"},"mime":{"json":"application/json","mvt":"application/x-protobuf","mvtb":"application/x-protobuf","topojson":"application/json"},"storage":{"s3":{"bucket":"mapzen-tiles-prod-us-east","keypattern":"/{prefix}/{hash}/{layer}/{z}/{x}/{y}.{fmt}","layer":"all","metatilesize":1,"healthcheck":"/checking_upness/broken.txt"}},"pattern":{"/mapzen/vector/v1/all/{z:[0-9]+}/{x:[0-9]+}/{y:[0-9]+}.{fmt}":{"type":"s3","prefix":"20170227"}}}'

2017/03/09 19:50:03.003693 {"category":"config","hostname":"Admins-MacBook-Pro-6.local","message":"Healthcheck failed on storage: NoSuchKey: The specified key does not exist.
    status code: 404, request id: 0860E54EADDE5967","type":"warning"}
2017/03/09 19:50:03.003909 {"hostname":"Admins-MacBook-Pro-6.local","message":"Server started and listening on :8000\n","type":"info"}

It still runs and responds with a 500:

$ curl -vs http://localhost:8000/_healthcheck > /dev/null
> GET /_healthcheck HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.43.0
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< Date: Thu, 09 Mar 2017 19:50:50 GMT
< Content-Length: 0
< Content-Type: text/plain; charset=utf-8
<

With an S3 key that exists:

$ tapalcatl_server -listen :8000 \
    -healthcheck /_healthcheck \
    -poolentrysize 10 \
    -poolentrysize 4096 \
    -handler '{"aws":{"region":"us-east-1"},"mime":{"json":"application/json","mvt":"application/x-protobuf","mvtb":"application/x-protobuf","topojson":"application/json"},"storage":{"s3":{"bucket":"mapzen-tiles-prod-us-east","keypattern":"/{prefix}/{hash}/{layer}/{z}/{x}/{y}.{fmt}","layer":"all","metatilesize":1,"healthcheck":"/checking_upness/sanity_check.txt"}},"pattern":{"/mapzen/vector/v1/all/{z:[0-9]+}/{x:[0-9]+}/{y:[0-9]+}.{fmt}":{"type":"s3","prefix":"20170227"}}}'

2017/03/09 19:51:30.851106 {"hostname":"Admins-MacBook-Pro-6.local","message":"Server started and listening on :8000\n","type":"info"}

And responds to a healthcheck with 200:

$ curl -vs http://localhost:8000/_healthcheck > /dev/null
> GET /_healthcheck HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.43.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Thu, 09 Mar 2017 19:52:38 GMT
< Content-Length: 0
< Content-Type: text/plain; charset=utf-8
<
iandees commented 7 years ago

Yea, I built this under the assumption that we only allowed one of each kind of storage, so with the changes from #17 this implementation probably isn't ideal.

rmarianski commented 7 years ago

We also don't have this case yet, so we can push to another issue/pr if that makes more sense.

Another aspect to consider is that if we have do have multiple storages, do we want a single storage failure to fail the healthcheck? In the general sense, it's nice to be able to continue to serve patterns for any healthchecks that do work, and have failures for the ones that don't, rather than not serve any traffic at all. Practically speaking though, we're most likely going to be using this by having all the storages be the same bucket with a different prefix, so we won't end up triggering this case, and probably shouldn't worry about it here. And if we consolidate the health checks, we'll only have a single one anyway.

zerebubuth commented 7 years ago

Yup, sorry if it wasn't clear: I approve this request, and was mentioning the consolidation thing as a potential future issue that we don't have yet, and that we can punt to a separate ticket to track.

rmarianski commented 7 years ago

I resolved the conflict, and made some minor changes. Merging.