Closed nathanielc closed 8 years ago
@joelegasse Here is an updated version...
@joelegasse How does this look now?
Looks good to me, still has [WIP]
in the title, though. Have you tested this out locally?
@joelegasse I have tested locally, I was able to buffer up several thousand writes when a backend was down and to see them all written once the backend came back online. The backoff worked as expected. I was also able to fill up the buffer and see that errors are correctly logged and returned to the client.
This add retry logic to the HTTP backends. Obviously it doesn't make sense to add retry logic to the UDP backend. The intent of this logic is reduce the number of failures during short outages or periodic network issues. _This retry logic is not sufficient for for long periods of downtime as all data is buffered in RAM _
Config options
With these two config options is should be easy to reason about your fault tolerance properties. For example if MaxRetryTime is 1m than a backend server cannot be down more than a minute or it will be guaranteed to be out of sync. BufferSize should to be large enough to buffer all write operations for MaxRetyTime, empirically you should be able to measure RAM usage as needed.
Each backend has its own buffer and retries are serialized to each backend. This should prevent stampeeding of requests once a backend server recovers from an outage.
TODO:
NOTE: I also implemented the HTTP timeout since it was a configuration option but did not work. (I ran unto that bug during testing)
~~NOTE: This PR adds one new dependency on https://github.com/cenkalti/backoff I thought about copy/pasting the needed bits but that mean nearly all of the repo, so I decided importing was better than copy/paste here. Its a simple well written package. I could be convinced otherwise if someone else feels strongly.~~ Dep has been removed