We've noticed errors like (connection reset by peer or EOF) when bramble polls certain services, most notably anything built on nodejs. After some investigation it turns out this is due to servers having a shorter Keep-Alive than bramble's http client, which can cause race conditions.
Solution
Our first attempt was to introduce some randomness to the polling period with jitter. While this helped we still ran into the errors above.
It seems the most bullet proof way to not have race conditions is to have the client's Keep-Alive be shorter than the servers. Given service polling is not a high throughput task, we decided it's easier to just disable it entirely for this task. The http client bramble uses to run queries against downstream services is untouched.
Problem
We've noticed errors like (
connection reset by peer
orEOF
) when bramble polls certain services, most notably anything built on nodejs. After some investigation it turns out this is due to servers having a shorterKeep-Alive
than bramble's http client, which can cause race conditions.Solution
Our first attempt was to introduce some randomness to the polling period with jitter. While this helped we still ran into the errors above.
It seems the most bullet proof way to not have race conditions is to have the client's
Keep-Alive
be shorter than the servers. Given service polling is not a high throughput task, we decided it's easier to just disable it entirely for this task. The http client bramble uses to run queries against downstream services is untouched.