We introduce a solution to reduce the log error "Peer xxx not sync for a long time" when fullnode syncs with peer.
Abstract
When one fullnode synchronizes with other peer, there is a blockLock in TronNetDelegate. There are three threads to apply for this lock. sometimes, when one peer canot get this blockLock in ChainInventoryMsgHandler,
this peer may be disconnected by thread PeerStatusCheck.check because it finds that this peer's blockBothHaveUpdateTime is blockUpdateTimeout seconds ago than system time. We Introduce a solution to decrease the time cost of acquire blockLock in fullnode that synchronizes with other peer.
Motivation
Sometimes we see the error in tron.log:
00:03:22.898 WARN [pool-53-thread-1] [net](PeerStatusCheck.java:49) Peer /206.221.176.113 not sync for a long time.
00:12:14.924 WARN [pool-53-thread-1] [net](PeerStatusCheck.java:49) Peer /104.194.8.221 not sync for a long time.
00:12:42.926 WARN [pool-53-thread-1] [net](PeerStatusCheck.java:49) Peer /5.9.101.190 not sync for a long time.
00:16:40.980 WARN [pool-53-thread-1] [net](PeerStatusCheck.java:49) Peer /109.238.14.182 not sync for a long time.
00:16:40.981 WARN [pool-53-thread-1] [net](PeerStatusCheck.java:49) Peer /5.9.101.190 not sync for a long time.
00:55:55.277 WARN [pool-53-thread-1] [net](PeerStatusCheck.java:49) Peer /47.104.153.187 not sync for a long time.
00:56:01.277 WARN [pool-53-thread-1] [net](PeerStatusCheck.java:49) Peer /3.142.169.23 not sync for a long time.
01:07:29.314 WARN [pool-53-thread-1] [net](PeerStatusCheck.java:49) Peer /3.142.169.23 not sync for a long time.
01:09:49.333 WARN [pool-53-thread-1] [net](PeerStatusCheck.java:49) Peer /13.229.124.42 not sync for a long time.
01:17:39.358 WARN [pool-53-thread-1] [net](PeerStatusCheck.java:49) Peer /103.85.253.22 not sync for a long time.
01:17:39.358 WARN [pool-53-thread-1] [net](PeerStatusCheck.java:49) Peer /183.131.184.130 not sync for a long time.
02:05:57.891 WARN [pool-53-thread-1] [net](PeerStatusCheck.java:49) Peer /47.252.13.200 not sync for a long time.
02:06:09.892 WARN [pool-53-thread-1] [net](PeerStatusCheck.java:49) Peer /18.224.252.203 not sync for a long time.
02:06:09.892 WARN [pool-53-thread-1] [net](PeerStatusCheck.java:49) Peer /51.81.109.193 not sync for a long time.
02:06:53.895 WARN [pool-53-thread-1] [net](PeerStatusCheck.java:49) Peer /47.252.6.86 not sync for a long time.
02:06:53.895 WARN [pool-53-thread-1] [net](PeerStatusCheck.java:49) Peer /10.60.100.237 not sync for a long time.
02:06:53.895 WARN [pool-53-thread-1] [net](PeerStatusCheck.java:49) Peer /10.60.100.33 not sync for a long time.
02:06:53.895 WARN [pool-53-thread-1] [net](PeerStatusCheck.java:49) Peer /209.126.87.106 not sync for a long time.
02:07:27.897 WARN [pool-53-thread-1] [net](PeerStatusCheck.java:49) Peer /18.232.41.41 not sync for a long time.
02:11:29.941 WARN [pool-53-thread-1] [net](PeerStatusCheck.java:49) Peer /206.221.176.113 not sync for a long time.
02:11:29.941 WARN [pool-53-thread-1] [net](PeerStatusCheck.java:49) Peer /13.214.179.230 not sync for a long time.
03:22:54.472 WARN [pool-53-thread-1] [net](PeerStatusCheck.java:49) Peer /51.81.184.18 not sync for a long time.
03:22:58.472 WARN [pool-53-thread-1] [net](PeerStatusCheck.java:49) Peer /13.38.128.252 not sync for a long time.
03:23:00.473 WARN [pool-53-thread-1] [net](PeerStatusCheck.java:49) Peer /51.81.67.97 not sync for a long time.
13:26:09.850 WARN [pool-53-thread-1] [net](PeerStatusCheck.java:49) Peer /47.90.250.85 not sync for a long time.
13:26:13.851 WARN [pool-53-thread-1] [net](PeerStatusCheck.java:49) Peer /10.60.100.83 not sync for a long time.
13:26:15.851 WARN [pool-53-thread-1] [net](PeerStatusCheck.java:49) Peer /10.60.100.33 not sync for a long time.
these peers are disconnected because unexpect reason because "not sync for a long time" by thread PeerStatusCheck.check.
public void statusCheck() {
long now = System.currentTimeMillis();
tronNetDelegate.getActivePeer().forEach(peer -> {
boolean isDisconnected = false;
if (peer.isNeedSyncFromPeer()
&& peer.getBlockBothHaveUpdateTime() < now - blockUpdateTimeout) {
logger.warn("Peer {} not sync for a long time.", peer.getInetAddress());
isDisconnected = true;
}
...
}
}
So why this peer has not been sync for a long time? It's the same question as why PeerConnection.blockBothHaveUpdateTime has not been updated for long time.
blockBothHaveUpdateTime is update in ChainInventoryMsgHandler.processMessage
This can reduce the total time of lock blockLock in SyncService.handleSyncBlock, so other thread such as ChainInventoryMsgHandler.processMessage can get the lock more esaily.
Simple Summary
We introduce a solution to reduce the log error "Peer xxx not sync for a long time" when fullnode syncs with peer.
Abstract
When one fullnode synchronizes with other peer, there is a
blockLock
in TronNetDelegate. There are three threads to apply for this lock. sometimes, when one peer canot get thisblockLock
in ChainInventoryMsgHandler, this peer may be disconnected by thread PeerStatusCheck.check because it finds that this peer's blockBothHaveUpdateTime is blockUpdateTimeout seconds ago than system time. We Introduce a solution to decrease the time cost of acquireblockLock
in fullnode that synchronizes with other peer.Motivation
Sometimes we see the error in tron.log:
these peers are disconnected because unexpect reason because "not sync for a long time" by thread PeerStatusCheck.check.
So why this peer has not been sync for a long time? It's the same question as why PeerConnection.blockBothHaveUpdateTime has not been updated for long time. blockBothHaveUpdateTime is update in ChainInventoryMsgHandler.processMessage
we can see if this peer cannot get the blockLock, blockBothHaveUpdateTime cannot be update.
blockLock
is also used in SyncService.handleSyncBlockSuppose we get many blocks from one peer, as processSyncBlock need to write disk, this
for
cycle may cost a long time.Implementation
Our solution is to swap the order of getBlockLock and the
for
cycle:This can reduce the total time of lock
blockLock
in SyncService.handleSyncBlock, so other thread such as ChainInventoryMsgHandler.processMessage can get the lock more esaily.