Closed bladedoyle closed 6 years ago
Interesting! looks still has bug here. Could you give the full log file?
Look like I just reproduce it:
Sep 23 11:31:45.537 DEBG sync_state: sync_status: BodySync { current_height: 105911, highest_height: 107339 } -> HeaderSync { current_height: 107338, highest_height: 107339 }
...
Sep 23 11:32:14.867 DEBG sync_state: sync_status: BodySync { current_height: 107293, highest_height: 107339 } -> NoSync
...
(almost 19 minutes later)
...
Sep 23 13:51:03.285 INFO sync: total_difficulty 46767414, peer_difficulty 46769073, threshold 1634 (last 5 blocks), enabling sync
Sep 23 13:51:03.285 DEBG body_sync: body_head - 0b9c81e5, 107494, header_head - 0b9c81e5, 107494, sync_head - 042bc8c6, 107339
Sep 23 13:51:03.285 DEBG sync_state: sync_status: NoSync -> BodySync { current_height: 107494, highest_height: 107499 }
Sep 23 13:51:05.287 DEBG sync_state: sync_status: BodySync { current_height: 107494, highest_height: 107499 } -> HeaderSync { current_height: 107494, highest_height: 107499 }
Sep 23 13:51:05.290 DEBG sync: locator heights : [107339, 107337, 107333, 107325, 107309, 107277, 107213, 107085, 106829, 106317, 105293, 103245, 99149, 90957, 74573, 41805, 0]
<<<< problem here! still using sync_head at 107339 instead of header_head 107494
Then, the remaining thing is almost same as https://github.com/mimblewimble/grin/issues/1524, all received headers are "All known, ignoring".
The root cause is:
We will reset sync_head to header_head on initial transition to HeaderSync, but somehow when we transit from NoSync
, the next first state is BodySync
instead of HeaderSync
!
Sep 23 13:51:03.285 DEBG sync_state: sync_status: NoSync -> BodySync { current_height: 107494, highest_height: 107499 }
Looking into function header_sync_due()
, there was a condition:
let all_headers_received =
header_head.height >= prev_height + (p2p::MAX_BLOCK_HEADERS as u64) - 4;
let stalling = header_head.height == latest_height && now > timeout;
if all_headers_received || stalling {
...
true
}
Which make this header_sync_due()
definitely return false IF all_headers_received
is false, when enabling sync. That's why when we transit from NoSync
the next first state is BodySync
instead of HeaderSync
!
A fix solution is: always enable header sync on initial state transition from NoSync / Initial. will give a PR on this fixing.
Closed by #1582
My archive node often gets "stuck". Restarting the grin server process fixes it. No ERRO messages in the log. But there is this which shows the problem (see current_height vs highest height):
`Sep 23 02:40:01.178 DEBG body_sync: body_head - 0e286867, 106953, header_head - 0ae10456, 106950, sync_head - 0e286867, 106953
Sep 23 02:40:01.179 DEBG sync_state: sync_status: BodySync { current_height: 106953, highest_height: 107266 } -> BodySync { current_height: 106953, highest_height: 107267 }
Sep 23 02:40:06.186 DEBG body_sync: body_head - 0e286867, 106953, header_head - 0ae10456, 106950, sync_head - 0e286867, 106953`
and
`Sep 23 02:39:58.565 DEBG pipe: process_block_header at 107267 [0787e127]
Sep 23 02:39:58.565 DEBG Block header 0787e127 refused by chain: Orphan`