Closed senalw closed 1 year ago
Thank you @senalw for working on this 🙏
I'm neither a go expert nor an expert with the wave codebase so I'm not the right person to decide whether this PR should go in or not, just a couple of comments from my side:
- This continues to be the only release blocker for MLOps 0.62.0. We won't be able to release our software because of it and we're already late by a couple of weeks. So any help from the wave team @mturoci @lo5 will be extremely appreciated 🙏
- @senalw I don't see any linked issue in the PR. Did we not create an issue? I know there have been slack conversations but the issues are how the work is being tracked so please create one if it's not there already.
@senalw Since there's a go package for everything nowadays why try to implement this yourself if you can use an already existing and tested and much more configurable package for it?
Thanks for the suggestions. I linked the issue for this PR.
This PR is a needlessly complicated solution to what's obviously an availability / resource-scheduling problem on your side.
If you really want the wave server to not drop the app after a TCP reset, the simple fix is to introduce a flag that will do exactly that.
e.g.
func (app *App) forward(clientID string, session *Session, data []byte) {
if err := app.send(clientID, session, data); err != nil {
echo(Log{"t": "app", "route": app.route, "host": app.addr, "error": err.Error()})
if (dropIfUnresponsive) { // get global setting from env var or arg, default true to preserve current behavior.
app.broker.dropApp(app.route)
}
}
}
This way, the underlying http.Client
will automatically re-establish the connection if possible.
@lo5 comment implementation - https://github.com/h2oai/wave/pull/2050
Issue: Currently Waved disconnects app when a request failure occurs between Waved and app. Therefore, this PR will introduce a request retry mechanism to retry failed requests for a certain amount of time period which can be configured by a user. If this retry count (
H2O_WAVE_MAX_REQUEST_RETRY_COUNT
) is not configured, it wont enable request retry mechanism.Testing Conditions,
H2O_WAVE_MAX_REQUEST_RETRY_COUNT=20
andH2O_WAVE_REQUEST_RETRY_INTERVAL=1s
Testing Records,
read: connection reset by peer
disconnection for 2 seconds and reset back to normal with deleting iptables rule after 2 seconds.sleep 5 && iptables -A INPUT -p tcp --dport 8000 -j REJECT --reject-with tcp-reset && sleep 2 && iptables -D INPUT -p tcp --dport 8000 -j REJECT --reject-with tcp-reset
Result: App was able to successfully recover after 2 seconds,
https://github.com/h2oai/wave/assets/12801761/d0e50281-fe8b-4136-99b8-15a68cd7afc8
read: connection reset by peer
disconnection until requests reachH2O_WAVE_MAX_REQUEST_RETRY_COUNT
and drops the appsleep 5 && iptables -A INPUT -p tcp --dport 8000 -j REJECT --reject-with tcp-reset
Result: App drops after requests reach maximum retry count and unable to recover the connection.
https://github.com/h2oai/wave/assets/12801761/0d1a484a-9323-4bcc-a25f-7d650b69b29e