hashgraph / hedera-sdk-java

Hedera™ Hashgraph SDK for Java
https://docs.hedera.com/guides/docs/sdks
Apache License 2.0
229 stars 117 forks source link

Node connection issues have significantly more impact on performance comparing to JS SDK #2070

Open AlexanderShenshin opened 2 weeks ago

AlexanderShenshin commented 2 weeks ago

Recently I noticed performance inconsistency between Hedera Java SDK and JS SDK while having issues with node connections, with Java SDK working significantly worse.

I would like to share my findings, it may help to find potential issues in Java SDK and/or figure out what exactly work better in JS SDK for described case.

Environment

Problem description

The issues were initially discovered while running some tests against Hedera Testnet using Java SDK under the hood. These tests are only using HCS-related API (topic creation, submitting & reading topic messages).

I tried to run the same tests in JS library (uses JS SDK) and it apparently worked fine. To confirm my assumptions, I ran Java SDK integration tests against Testnet and observed the same performance/timeouts issues.

Summary of issues comparing to JS SDK:

Workaround

It was possible to work around the problem by doing the following:

After moving to custom network map/config, timeouts and performance issues are gone. Same integration tests that I run with Java SDK are now even faster than with JS SDK.

0xivanov commented 2 weeks ago

Hello @AlexanderShenshin , thank you for opening this issue.

I have a couple of questions / suggestions:

  1. You mentioned that you are using did-js-sdk , what does this have to do with the Java SDK? Are you using the Java SDK in a similar way as did-js-sdk? If so - please give more information about that and maybe some code snippet - configuring the client -> configuring the transaction -> executing the transaction -> getting the receipt, etc.
  2. Would you be able to set the SDK Client's log level to trace : client.setLogger(new Logger(LogLevel.TRACE));? Please attach the logs when executing the transaction.
  3. If java.util.concurrent.TimeoutException occurres, can you send me the SDK relevant stack trace as well?
  4. I am running the integration tests for testnet and previewnet right now and the connection is successful. I also checked sone github jobs from 5 days ago (when this issue was opened) and the connection was fine.
AlexanderShenshin commented 6 days ago

Hello @0xivanov, thank you for the response.

Sure, I will try to address these points.

First of all, it's clear for me that connection issues that I encounter are not directly related to Java SDK - my team member do not encounter them using the same code and I noticed that same integration tests work well in GH pipelines (as you mentioned). What makes me curious is that SDKs perform differently.

So the point for raising this issue is that JS SDK seems to be performing significantly better in my case and this may uncover some potential improvements for Java SDK side.

Regarding did-sdk-js, I initially discovered problematic behavior by comparing integration tests performance of DID SDKs that correspondingly use Hedera Java SDK and Hedera JS SDK under the hood.

To isolate our context, I reproduced the behavior by comparing results of Java and JS SDKs integration tests:

I've run Java SDK tests using following command: ./gradlew :sdk:testIntegration -POPERATOR_ID="{OPERATOR_ID}" -POPERATOR_KEY="{OPERATOR_KEY}" -PHEDERA_NETWORK="testnet" -PSDK_LOG_LEVEL="TRACE" --tests "TopicMessageIntegrationTest"

Here's the log output for failed Java SDK test case:

TopicMessageIntegrationTest > Can receive a large topic message STANDARD_OUT
    01:12:51.543 [Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger -- Retrying in 8000 ms after channel connection failure with node 0.0.6 during attempt #1

TopicMessageIntegrationTest > Can receive a large topic message STANDARD_ERROR
    com.hedera.hashgraph.sdk.MaxAttemptsExceededException: exceeded maximum attempts for request with last exception being

TopicMessageIntegrationTest > Can receive a large topic message STANDARD_OUT
    01:13:03.976 [Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger -- Retrying in 7999 ms after channel connection failure with node 0.0.5 during attempt #1

TopicMessageIntegrationTest > Can receive a large topic message STANDARD_ERROR
    com.hedera.hashgraph.sdk.MaxAttemptsExceededException: exceeded maximum attempts for request with last exception being

TopicMessageIntegrationTest > Can receive a large topic message STANDARD_OUT
    01:13:16.465 [Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger -- Retrying in 8000 ms after channel connection failure with node 0.0.5 during attempt #1

TopicMessageIntegrationTest > Can receive a large topic message STANDARD_ERROR
    com.hedera.hashgraph.sdk.MaxAttemptsExceededException: exceeded maximum attempts for request with last exception being

TopicMessageIntegrationTest > Can receive a large topic message STANDARD_OUT
    01:13:17.415 [Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger -- Retrying in 250 ms after failure with node 0.0.9 during attempt #1: OK
    01:13:17.427 [Test worker] DEBUG com.hedera.hashgraph.sdk.logger.Logger -- Sleeping for: 250 | Thread name: Test worker
    01:13:30.189 [Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger -- Retrying in 8000 ms after channel connection failure with node 0.0.9 during attempt #2                                       
    01:13:43.265 [Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger -- Retrying in 8000 ms after channel connection failure with node 0.0.9 during attempt #1
    01:13:43.266 [Test worker] DEBUG com.hedera.hashgraph.sdk.logger.Logger -- Sleeping for: 7999 | Thread name: Test worker
    01:14:03.702 [Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger -- Retrying in 16000 ms after channel connection failure with node 0.0.9 during attempt #2                                      
    01:14:03.702 [Test worker] DEBUG com.hedera.hashgraph.sdk.logger.Logger -- Sleeping for: 16000 | Thread name: Test worker
    01:14:32.142 [Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger -- Retrying in 32000 ms after channel connection failure with node 0.0.9 during attempt #3                                      
    01:14:32.143 [Test worker] DEBUG com.hedera.hashgraph.sdk.logger.Logger -- Sleeping for: 31999 | Thread name: Test worker
    01:15:16.590 [Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger -- Retrying in 64000 ms after channel connection failure with node 0.0.9 during attempt #4
    01:15:16.593 [Test worker] DEBUG com.hedera.hashgraph.sdk.logger.Logger -- Sleeping for: 63997 | Thread name: Test worker

TopicMessageIntegrationTest > Can receive a large topic message FAILED
    java.util.concurrent.TimeoutException
        at com.hedera.hashgraph.sdk@2.43.0/com.hedera.hashgraph.sdk.Executable.execute(Executable.java:431)
        at com.hedera.hashgraph.sdk@2.43.0/com.hedera.hashgraph.sdk.Transaction.execute(Transaction.java:53)
        at com.hedera.hashgraph.sdk@2.43.0/com.hedera.hashgraph.sdk.Executable.execute(Executable.java:367)
        at com.hedera.hashgraph.sdk@2.43.0/com.hedera.hashgraph.sdk.Transaction.execute(Transaction.java:53)
        at com.hedera.hashgraph.sdk.test.integration/com.hedera.hashgraph.sdk.test.integration.TopicMessageIntegrationTest.canReceiveALargeTopicMessage(TopicMessageIntegrationTest.java:103)

And here's the output for successful test case:

TopicMessageIntegrationTest > Can receive a topic message STANDARD_ERROR
    [Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger - Retrying in 8000 ms after channel connection failure with node 0.0.3 during attempt #1
    com.hedera.hashgraph.sdk.MaxAttemptsExceededException: exceeded maximum attempts for request with last exception being
    [Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger - Retrying in 7999 ms after channel connection failure with node 0.0.9 during attempt #1
    com.hedera.hashgraph.sdk.MaxAttemptsExceededException: exceeded maximum attempts for request with last exception being
    [Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger - Retrying in 8000 ms after channel connection failure with node 0.0.9 during attempt #1                                                     
    com.hedera.hashgraph.sdk.MaxAttemptsExceededException: exceeded maximum attempts for request with last exception being
    [Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger - Retrying in 8000 ms after channel connection failure with node 0.0.5 during attempt #1                                                     
    [Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger - Retrying in 8000 ms after channel connection failure with node 0.0.5 during attempt #2
    [Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger - Retrying in 250 ms after failure with node 0.0.5 during attempt #1: OK                                                                     
    [Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger - Retrying in 500 ms after failure with node 0.0.5 during attempt #2: OK                                                                     
    [Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger - Retrying in 250 ms after failure with node 0.0.5 during attempt #1: OK                                                                     
    [Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger - Retrying in 500 ms after failure with node 0.0.5 during attempt #2: OK                                                                     
    [Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger - Retrying in 1000 ms after failure with node 0.0.5 during attempt #3: OK                                                                    
    [Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger - Retrying in 250 ms after failure with node 0.0.5 during attempt #1: OK                                                                     
    [Test worker] WARN com.hedera.hashgraph.sdk.logger.Logger - Retrying in 500 ms after failure with node 0.0.5 during attempt #2: OK                                                                     
    [hedera-sdk-5] WARN com.hedera.hashgraph.sdk.TopicMessageQuery - Call is cancelled for topic 0.0.5148534.                                                                                              

TopicMessageIntegrationTest > Can receive a topic message PASSED

For reference, I'm also posting full logs of a single test run for both SDKs:

0xivanov commented 3 days ago

I've spent some time looking into this today and I have a couple of things to say:

  1. The test environment does client.setMaxNodesPerTransaction(1); thus making the sdk to retry only for 1 node.
  2. MaxAttemptsExceededException occurres when we've retried 10 times (10 is the default, can be changed with setMaxAttempts). In your logs we are seeing 3-4 attempts, so it's weird this exception gets printed.
  3. TimeoutException occurres when sum of the time of grpc requests exceeds 2 mins (can be changed with setRequestTimeout())
0xivanov commented 3 days ago

I am still unable to reproduce this on testnet, I am conncting to node 9, which was failing for you.