Closed sirstevepal closed 7 years ago
Can you provide a failing test?
From: sirstevepal notifications@github.com Reply-To: palantir/http-remoting reply@reply.github.com Date: Thursday, August 31, 2017 at 11:31 PM To: palantir/http-remoting http-remoting@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [palantir/http-remoting] Remoting 3.3.0-rc1 Not Failing Over Correctly (#530)
When testing http-remoting-3.3.0-rc1 we noticed that when a node that is receiving requests goes down, the client does not fail over to the next node in the list and instead just throws an exception.
Previously http-remoting would fail over to the next node in the list in order to utilize a service configured to be highly available.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub[github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_palantir_http-2Dremoting_issues_530&d=DwMCaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=UfcWCaR4ui50AFap-gezrx5XYtPtH-9JpazU7tbRW-4&m=4dv1krMsH7K_3X80-s7-_bRZJvD98H3o2no9_p-4aKg&s=OnAeDqMc5HFWgFNq2v2fBTMDblB_uW4QFIGFdxT-Xxg&e=, or mute the thread[github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AGOdwUxvArY6TBgZzg9j8Po27nzSFtJFks5sdyYygaJpZM4PJb2B&d=DwMCaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=UfcWCaR4ui50AFap-gezrx5XYtPtH-9JpazU7tbRW-4&m=4dv1krMsH7K_3X80-s7-_bRZJvD98H3o2no9_p-4aKg&s=e7Qk9FF4_PWXqgMf8s2YGA4Vai7_Cy7tdvTKItQZw54&e=.
The failovers seem to work fine. Think the confusion stems from change in behavior from 2.x where by default a request tries all hosts before failing. What was the reason for changing that ?
The only change that I'm aware of (.. and there may be others..) is that as of 3.3.0-rc1 the client now respects config.maxNumRetries()
instead of retrying urls.size()
times. I think it'd be reasonable to change the default from 0 to, uhm, 3ish. (I had hoped to not release a 3.3.0 version yet because the retry work is still very much in progress. Made the rc1 because @meditativeape needed some other unrelated patch.)
FYI @meditativeape I think you had picked up the rc1 already, right? watch out for the retry config.
Gotcha, thanks for the heads-up. I think it would be reasonable to bump the default as you suggested, and change
https://github.com/palantir/http-remoting/blob/aafb00d031019c893c08b12b61c6511adbb44097/http-clients/src/main/java/com/palantir/remoting3/clients/ClientConfigurations.java#L63
to
.maxNumRetries(max(DEFAULT_MAX_NUM_RETRIES, config.uris().size))
?
When testing http-remoting-3.3.0-rc1 we noticed that when a node that is receiving requests goes down, the client does not fail over to the next node in the list and instead just throws an exception.
Previously http-remoting would fail over to the next node in the list in order to utilize a service configured to be highly available.