lavanet / lava

Apache License 2.0
404 stars 207 forks source link

[Feature]: Near Archive Node Performance Degradation and Suggested Rerouting Strategy #1646

Closed msobh13 closed 2 weeks ago

msobh13 commented 3 weeks ago

Summary

Description:

We are experiencing significant performance issues with the Near archive node, which frequently becomes unresponsive. This results in the application being flooded with the following warning messages:

WARN jsonrpc: Timeout: tx_status_fetch method. tx_info TransactionId

which also made lavap has this err

Aug 23 18:48:59 ERR TryRelay Failed error="Sending chainMsg failed ErrMsg: Post \"http://1.1.1.1:3030\": context deadline exceeded {GUID:1411907925123310770,specID:NEAR}: Post \"http://1.1.1.1:3030\": context deadline exceeded" GUID=1411907925123310770 request.SessionId=4156125492601189253 request.userAddr=lava@1qsenq9j7t7jzqj7e50mkn6kzamecen0t39apdz timed_out=true Aug 23 18:48:59 ERR Sending chainMsg failed error="Post \"http://1.1.1.1:3030\": context deadline exceeded" GUID=18127107815705213594 specID=NEAR Aug 23 18:48:59 ERR TryRelay Failed error="Sending chainMsg failed ErrMsg: Post \"http://1.1.1.1:3030\": context deadline exceeded {GUID:18127107815705213594,specID:NEAR}: Post \"http://1.1.1.1:3030\": context deadline exceeded" GUID=18127107815705213594 request.SessionId=286621748090294835 request.userAddr=lava@1qsenq9j7t7jzqj7e50mkn6kzamecen0t39apdz timed_out=true Aug 23 18:49:03 ERR Sending chainMsg failed error="Post \"http://1.1.1.1:3030\": context deadline exceeded" GUID=12729192286826640592 specID=NEAR

Suggestion:

To mitigate this issue, I propose implementing a feature that allows for rerouting requests, particularly those related to tx_status_fetch, to an alternative Near node that is not an archive node. This could help distribute the load more evenly and reduce the likelihood of timeouts.

Problem Definition

We are experiencing significant performance issues with the Near archive node, which frequently becomes unresponsive. This results in the application being flooded with the following warning messages:

WARN jsonrpc: Timeout: tx_status_fetch method. tx_info TransactionId

Proposed Feature

To mitigate this issue, I propose implementing a feature that allows for rerouting requests, particularly those related to tx_status_fetch, to an alternative Near node that is not an archive node. This could help distribute the load more evenly and reduce the likelihood of timeouts.