Closed chronos-tachyon closed 4 months ago
Will need to remember to put a warning in the SDK release notes about changing the defaults
I appreciate the explanation of this PR in the commit message! After this PR are there any differences between Java tuning and core SDKs, or does this move it to be basically identical?
I appreciate the explanation of this PR in the commit message! After this PR are there any differences between Java tuning and core SDKs, or does this move it to be basically identical?
There are some differences if you supply some values but leave others as defaults, and AFAICT Core SDK does not seem to have a concept of throttle/congestion for non-poll RPCs, but the behavior is much, much more similar.
What was changed
Why?
These changes bring the Java SDK's RPC retry options in line with those of the Core SDK, as the latter have been more carefully reasoned through... with one exception.
The Java poller's existing backoff factor of 1.2 was wildly inappropriate, causing the first 10 retries to hammer the server within less than 1.5 seconds. Even more confusingly, Java's maximum interval was far, far too long, as a 60s delay between the server becoming available and the worker noticing that fact is unreasonable and unnecessary. The poller now uses a 2.0 backoff and 10s cap, exactly in line with Core SDK.
Regarding Java's non-poll RPCs, OTOH, 2.0 is not as aggressive as Core SDK's choice of 1.5, and yet Core is actually in the wrong here: all 10 permitted retries can complete within the 10-second window expected for namespace migrations in Temporal Cloud. 1.7 seems to be the ideal number here, as it leads to a cumulative delay window of 11.8 to 17.8 seconds. I've used 1.7 here, and Core SDK should be updated to use it as well.