palantir / conjure-java-runtime

Opinionated libraries for HTTP&JSON-based RPC using Dialogue, Feign, OkHttp as clients and Jetty/Jersey as servers
Apache License 2.0
79 stars 95 forks source link

Remoting 3.3.0-rc1 Not Failing Over Correctly #530

Closed sirstevepal closed 7 years ago

sirstevepal commented 7 years ago

When testing http-remoting-3.3.0-rc1 we noticed that when a node that is receiving requests goes down, the client does not fail over to the next node in the list and instead just throws an exception.

Previously http-remoting would fail over to the next node in the list in order to utilize a service configured to be highly available.

uschi2000 commented 7 years ago

Can you provide a failing test?

From: sirstevepal notifications@github.com Reply-To: palantir/http-remoting reply@reply.github.com Date: Thursday, August 31, 2017 at 11:31 PM To: palantir/http-remoting http-remoting@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [palantir/http-remoting] Remoting 3.3.0-rc1 Not Failing Over Correctly (#530)

When testing http-remoting-3.3.0-rc1 we noticed that when a node that is receiving requests goes down, the client does not fail over to the next node in the list and instead just throws an exception.

Previously http-remoting would fail over to the next node in the list in order to utilize a service configured to be highly available.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub[github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_palantir_http-2Dremoting_issues_530&d=DwMCaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=UfcWCaR4ui50AFap-gezrx5XYtPtH-9JpazU7tbRW-4&m=4dv1krMsH7K_3X80-s7-_bRZJvD98H3o2no9_p-4aKg&s=OnAeDqMc5HFWgFNq2v2fBTMDblB_uW4QFIGFdxT-Xxg&e=, or mute the thread[github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AGOdwUxvArY6TBgZzg9j8Po27nzSFtJFks5sdyYygaJpZM4PJb2B&d=DwMCaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=UfcWCaR4ui50AFap-gezrx5XYtPtH-9JpazU7tbRW-4&m=4dv1krMsH7K_3X80-s7-_bRZJvD98H3o2no9_p-4aKg&s=e7Qk9FF4_PWXqgMf8s2YGA4Vai7_Cy7tdvTKItQZw54&e=.

gauravtiwari89 commented 7 years ago

The failovers seem to work fine. Think the confusion stems from change in behavior from 2.x where by default a request tries all hosts before failing. What was the reason for changing that ?

uschi2000 commented 7 years ago

The only change that I'm aware of (.. and there may be others..) is that as of 3.3.0-rc1 the client now respects config.maxNumRetries() instead of retrying urls.size() times. I think it'd be reasonable to change the default from 0 to, uhm, 3ish. (I had hoped to not release a 3.3.0 version yet because the retry work is still very much in progress. Made the rc1 because @meditativeape needed some other unrelated patch.)

FYI @meditativeape I think you had picked up the rc1 already, right? watch out for the retry config.

meditativeape commented 7 years ago

Gotcha, thanks for the heads-up. I think it would be reasonable to bump the default as you suggested, and change https://github.com/palantir/http-remoting/blob/aafb00d031019c893c08b12b61c6511adbb44097/http-clients/src/main/java/com/palantir/remoting3/clients/ClientConfigurations.java#L63 to .maxNumRetries(max(DEFAULT_MAX_NUM_RETRIES, config.uris().size))?

uschi2000 commented 7 years ago

Fixed by https://github.com/palantir/http-remoting/pull/531