vechain / vechain-sdk-js

The official JavaScript SDK for VeChain.
24 stars 9 forks source link

Nodes :: Online #168

Closed nwbrettski closed 1 year ago

nwbrettski commented 1 year ago

As a user I want to be able to get node information from the VechainThor blockchain so that I can check on the health of a given node.

Acceptance Criteria:

nwbrettski commented 1 year ago

Link in with Kostas how does he check the health of a node?

bgaughran commented 1 year ago

@nwbrettski Is it just one method required for this? And is there a reference interface in Ethereum or a legacy vechain library we may want to mimic?

I'll link in with Kostas as well

nwbrettski commented 1 year ago

Yes I think a single function which provides a true or false as to whether a given node is alive or dead is the desired functionality.

There is no alignment with the hardhat-provider so free to name the function whatever you think is best.

Cannot see anything in connex or thor-devkit.

bgaughran commented 1 year ago

Grand. Let me have a think. Maybe there could be a need to 'proxy' the results of https://testnet.vechain.org/doc/swagger-ui/#/Node/get_node_network_peers

It might be best of a do a mini refinement session on Slack with the developers. I can throw out some ideas when I reach out to Kostas first. He might have useful inputs

bgaughran commented 1 year ago

@nwbrettski Maybe the way to look at this is - why are we providing this? What is the use case for the developers? How would they use this if they had it today? Could you make an argument to say that would not check it is alive before they made an online interaction. Just spitballing here

nwbrettski commented 1 year ago

The use case would be either as a conditional statement in a multiple function request and for debugging.

For example, if I am submitting a transaction for error checking before submitting and potentially failing I would first check to ensure that the node that I am pointing to is online and health and then send the transaction otherwise do not send the transaction.

It could be useful for client side redundancy, if A node is failing use B node. Removes the need for the developer to have to create this functionality themselves through the presence of a function.

Feel free to discuss it with the other devs to see what their thoughts are on this and other potential use cases.

bgaughran commented 1 year ago

Great points @nwbrettski. I did some digging and I actually found this being used in the Java SDK in a class called BlockchainClient here.

Here is the public method provided in the SDK

    /**
     * Get status of your accessing nodes on the blockchain.
     * @return array of {@link PeerStat}
     * @throws ClientIOException network error.
     */
   public static PeerStatList getPeerStatusList()throws ClientIOException

So, I guess what I am saying is - it effectively proxies the call to https://testnet.vechain.org/doc/swagger-ui/#/Node/get_node_network_peers and returns the results of that Thorify call.

We could do something identical.

And in addition, provide another simpler method along the lines of isNodeHealthy(URL).

I'll tease it out next week.

bgaughran commented 1 year ago

@nwbrettski OK, based on feedback today, I plan to implement the following method signature isNodeHealthy (URL: string): boolean

Kostas has provided the algorithm used in the node hosting project which I will use here as follows

  1. We get the last block from the /blocks/best endpoint
  2. We check how old the block is by comparing the response's timestamp property with the current time
  3. If it's older than a specified tolerance in seconds, it reports that the node is unhealthy

This is a better implementation than your current suggestion to use /node/network/peers since there are some scenarios that won't be caught with this approach. For example, consider a scenario where the node is connected to 20+ peers, which is healthy, and it receives the new blocks as expected. But what if the node's disk is full and it's not writing the new blocks to it's database? In this case the node is off-sync even though it's technically alive and connected. I guess it depends how you define node health, but generally speaking, I think my algorithm is what most people are interested in. Note that this solution is susceptible to clock skew issues and our algorithm doesn't account for that. For our use, a few milliseconds up to 1 second of clock skew does not make a difference, when our tolerance is 30 seconds. This tolerance means that we consider a node healthy even when it's off-sync by roughly 3 blocks. That's why even though sometimes our nodes go off-sync by a block or two, we have no alerts. There have been 2 cases in about 8 months where one of the two mainnet nodes went off-sync by several blocks and was taken off the load balancer for a few minutes. In both cases, the nodes self healed and rejoined the load balancer target group. I presume this is due to a network issue, which is why we operate two nodes in separate availability zones, but we didn't do a full postmortem on this, since there was no impact

A healthy node should mean:

  1. node is in sync
  2. node is available (responsive to requests)
claytonneal commented 1 year ago

hiya, we have to think "how do we instantiate the SDK" as the URL will probably be already known perhaps @rodolfopietro97 can advise there!

bgaughran commented 1 year ago

@nwbrettski PR merged and ticket closed