Open AlexanderShenshin opened 2 weeks ago
Hello @AlexanderShenshin , thank you for opening this issue.
I have a couple of questions / suggestions:
Client
's log level to trace : client.setLogger(new Logger(LogLevel.TRACE));
? Please attach the logs when executing the transaction.java.util.concurrent.TimeoutException
occurres, can you send me the SDK relevant stack trace as well?Hello @0xivanov, thank you for the response.
Sure, I will try to address these points.
First of all, it's clear for me that connection issues that I encounter are not directly related to Java SDK - my team member do not encounter them using the same code and I noticed that same integration tests work well in GH pipelines (as you mentioned). What makes me curious is that SDKs perform differently.
So the point for raising this issue is that JS SDK seems to be performing significantly better in my case and this may uncover some potential improvements for Java SDK side.
Regarding did-sdk-js
, I initially discovered problematic behavior by comparing integration tests performance of DID SDKs that correspondingly use Hedera Java SDK and Hedera JS SDK under the hood.
To isolate our context, I reproduced the behavior by comparing results of Java and JS SDKs integration tests:
TopicMessageIntegrationTest
that present in both Java and JS SDKClient.setLogger
I've run Java SDK tests using following command:
./gradlew :sdk:testIntegration -POPERATOR_ID="{OPERATOR_ID}" -POPERATOR_KEY="{OPERATOR_KEY}" -PHEDERA_NETWORK="testnet" -PSDK_LOG_LEVEL="TRACE" --tests "TopicMessageIntegrationTest"
Here's the log output for failed Java SDK test case:
TopicMessageIntegrationTest > Can receive a large topic message STANDARD_OUT
01:12:51.543 [Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger -- Retrying in 8000 ms after channel connection failure with node 0.0.6 during attempt #1
TopicMessageIntegrationTest > Can receive a large topic message STANDARD_ERROR
com.hedera.hashgraph.sdk.MaxAttemptsExceededException: exceeded maximum attempts for request with last exception being
TopicMessageIntegrationTest > Can receive a large topic message STANDARD_OUT
01:13:03.976 [Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger -- Retrying in 7999 ms after channel connection failure with node 0.0.5 during attempt #1
TopicMessageIntegrationTest > Can receive a large topic message STANDARD_ERROR
com.hedera.hashgraph.sdk.MaxAttemptsExceededException: exceeded maximum attempts for request with last exception being
TopicMessageIntegrationTest > Can receive a large topic message STANDARD_OUT
01:13:16.465 [Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger -- Retrying in 8000 ms after channel connection failure with node 0.0.5 during attempt #1
TopicMessageIntegrationTest > Can receive a large topic message STANDARD_ERROR
com.hedera.hashgraph.sdk.MaxAttemptsExceededException: exceeded maximum attempts for request with last exception being
TopicMessageIntegrationTest > Can receive a large topic message STANDARD_OUT
01:13:17.415 [Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger -- Retrying in 250 ms after failure with node 0.0.9 during attempt #1: OK
01:13:17.427 [Test worker] DEBUG com.hedera.hashgraph.sdk.logger.Logger -- Sleeping for: 250 | Thread name: Test worker
01:13:30.189 [Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger -- Retrying in 8000 ms after channel connection failure with node 0.0.9 during attempt #2
01:13:43.265 [Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger -- Retrying in 8000 ms after channel connection failure with node 0.0.9 during attempt #1
01:13:43.266 [Test worker] DEBUG com.hedera.hashgraph.sdk.logger.Logger -- Sleeping for: 7999 | Thread name: Test worker
01:14:03.702 [Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger -- Retrying in 16000 ms after channel connection failure with node 0.0.9 during attempt #2
01:14:03.702 [Test worker] DEBUG com.hedera.hashgraph.sdk.logger.Logger -- Sleeping for: 16000 | Thread name: Test worker
01:14:32.142 [Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger -- Retrying in 32000 ms after channel connection failure with node 0.0.9 during attempt #3
01:14:32.143 [Test worker] DEBUG com.hedera.hashgraph.sdk.logger.Logger -- Sleeping for: 31999 | Thread name: Test worker
01:15:16.590 [Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger -- Retrying in 64000 ms after channel connection failure with node 0.0.9 during attempt #4
01:15:16.593 [Test worker] DEBUG com.hedera.hashgraph.sdk.logger.Logger -- Sleeping for: 63997 | Thread name: Test worker
TopicMessageIntegrationTest > Can receive a large topic message FAILED
java.util.concurrent.TimeoutException
at com.hedera.hashgraph.sdk@2.43.0/com.hedera.hashgraph.sdk.Executable.execute(Executable.java:431)
at com.hedera.hashgraph.sdk@2.43.0/com.hedera.hashgraph.sdk.Transaction.execute(Transaction.java:53)
at com.hedera.hashgraph.sdk@2.43.0/com.hedera.hashgraph.sdk.Executable.execute(Executable.java:367)
at com.hedera.hashgraph.sdk@2.43.0/com.hedera.hashgraph.sdk.Transaction.execute(Transaction.java:53)
at com.hedera.hashgraph.sdk.test.integration/com.hedera.hashgraph.sdk.test.integration.TopicMessageIntegrationTest.canReceiveALargeTopicMessage(TopicMessageIntegrationTest.java:103)
And here's the output for successful test case:
TopicMessageIntegrationTest > Can receive a topic message STANDARD_ERROR
[Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger - Retrying in 8000 ms after channel connection failure with node 0.0.3 during attempt #1
com.hedera.hashgraph.sdk.MaxAttemptsExceededException: exceeded maximum attempts for request with last exception being
[Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger - Retrying in 7999 ms after channel connection failure with node 0.0.9 during attempt #1
com.hedera.hashgraph.sdk.MaxAttemptsExceededException: exceeded maximum attempts for request with last exception being
[Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger - Retrying in 8000 ms after channel connection failure with node 0.0.9 during attempt #1
com.hedera.hashgraph.sdk.MaxAttemptsExceededException: exceeded maximum attempts for request with last exception being
[Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger - Retrying in 8000 ms after channel connection failure with node 0.0.5 during attempt #1
[Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger - Retrying in 8000 ms after channel connection failure with node 0.0.5 during attempt #2
[Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger - Retrying in 250 ms after failure with node 0.0.5 during attempt #1: OK
[Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger - Retrying in 500 ms after failure with node 0.0.5 during attempt #2: OK
[Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger - Retrying in 250 ms after failure with node 0.0.5 during attempt #1: OK
[Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger - Retrying in 500 ms after failure with node 0.0.5 during attempt #2: OK
[Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger - Retrying in 1000 ms after failure with node 0.0.5 during attempt #3: OK
[Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger - Retrying in 250 ms after failure with node 0.0.5 during attempt #1: OK
[Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger - Retrying in 500 ms after failure with node 0.0.5 during attempt #2: OK
[hedera-sdk-5] WARN com.hedera.hashgraph.sdk.TopicMessageQuery - Call is cancelled for topic 0.0.5148534.
TopicMessageIntegrationTest > Can receive a topic message PASSED
For reference, I'm also posting full logs of a single test run for both SDKs:
I've spent some time looking into this today and I have a couple of things to say:
client.setMaxNodesPerTransaction(1);
thus making the sdk to retry only for 1 node.MaxAttemptsExceededException
occurres when we've retried 10 times (10 is the default, can be changed with setMaxAttempts
). In your logs we are seeing 3-4 attempts, so it's weird this exception gets printed.TimeoutException
occurres when sum of the time of grpc requests exceeds 2 mins (can be changed with setRequestTimeout()
)I am still unable to reproduce this on testnet, I am conncting to node 9, which was failing for you.
Recently I noticed performance inconsistency between Hedera Java SDK and JS SDK while having issues with node connections, with Java SDK working significantly worse.
I would like to share my findings, it may help to find potential issues in Java SDK and/or figure out what exactly work better in JS SDK for described case.
Environment
Problem description
The issues were initially discovered while running some tests against Hedera Testnet using Java SDK under the hood. These tests are only using HCS-related API (topic creation, submitting & reading topic messages).
I tried to run the same tests in JS library (uses JS SDK) and it apparently worked fine. To confirm my assumptions, I ran Java SDK integration tests against Testnet and observed the same performance/timeouts issues.
Summary of issues comparing to JS SDK:
java.util.concurrent.TimeoutException
) that randomly happen ontransaction.execute()
calls and HCS topic messages subscriptionWorkaround
It was possible to work around the problem by doing the following:
can connect to testnet with TLS STANDARD_OUT
:Client.fromConfig()
with custom config instead ofClient.forTestnet()
After moving to custom network map/config, timeouts and performance issues are gone. Same integration tests that I run with Java SDK are now even faster than with JS SDK.