Open jzethar opened 10 months ago
How did you deal with this problem?
Slow node sync
Here I'm using an archive node without validating, only syncing blocks.
Server characteristics
Server has:
- 5 TB disk with zfs pool
- 256GB of RAM
- AMD EPYC 7763 64-Core Processor
- Debian 5.10.149-2
Update flow
After updating from v2023.10 to v2024.01, the node started to lag. The validator-engine is a self-compiled application from the TON repo and was compiled according to TON-repo recommendations. The flow of updating was next:
- Compile new version of node
- Stop validator
- Update validator
- Validator crashed (was found 5 hours later)
- Recompiled validator
- Validator is normally syncing
The node has been working for one-two days without crashing, but it's lagging and can't sync to the latest blocks. It's syncing (as seen from the logs), but abnormally slowly, and the gap between the last block and the synced block is increasing (8 hours ago it was 15k blocks, now it's 16k blocks).
Some logs
[ 3][t 7][2024-02-02 14:00:34.416560689][liteserver.cpp:79][!litequery] aborted liteserver query: [Error : -503 : timeout] [ 3][t 5][2024-02-02 14:00:34.416565629][liteserver.cpp:79][!litequery] aborted liteserver query: [Error : -503 : timeout] [ 3][t 4][2024-02-02 14:00:34.416572149][liteserver.cpp:79][!litequery] aborted liteserver query: [Error : -503 : timeout] [ 3][t 7][2024-02-02 14:00:34.416573309][liteserver.cpp:79][!litequery] aborted liteserver query: [Error : -503 : timeout] [ 3][t 6][2024-02-02 14:00:34.416573199][liteserver.cpp:79][!litequery] aborted liteserver query: [Error : -503 : timeout] [ 3][t 5][2024-02-02 14:00:34.416581159][liteserver.cpp:79][!litequery] aborted liteserver query: [Error : -503 : timeout] [ 3][t 7][2024-02-02 14:00:34.416588839][liteserver.cpp:79][!litequery] aborted liteserver query: [Error : -503 : timeout] [ 3][t 6][2024-02-02 14:00:34.416601799][liteserver.cpp:79][!litequery] aborted liteserver query: [Error : -503 : timeout] [ 3][t 5][2024-02-02 14:00:34.416602869][liteserver.cpp:79][!litequery] aborted liteserver query: [Error : -503 : timeout] [ 3][t 7][2024-02-02 14:00:34.416603579][liteserver.cpp:79][!litequery] aborted liteserver query: [Error : -503 : timeout]
[ 3][t 7][2024-02-02 14:04:00.262468356][liteserver.cpp:234][!litequery] started a getMasterchainInfo(-1) liteserver query [ 3][t 7][2024-02-02 14:04:00.262475156][liteserver.cpp:79][!litequery] aborted liteserver query: [Error : -503 : timeout] [ 3][t 5][2024-02-02 14:04:00.262483716][liteserver.cpp:234][!litequery] started a getMasterchainInfo(-1) liteserver query [ 3][t 5][2024-02-02 14:04:00.262489546][liteserver.cpp:79][!litequery] aborted liteserver query: [Error : -503 : timeout] [ 3][t 5][2024-02-02 14:04:00.262500696][liteserver.cpp:234][!litequery] started a getMasterchainInfo(-1) liteserver query [ 3][t 5][2024-02-02 14:04:00.262506436][liteserver.cpp:741][!litequery] started a getAccountState((-1,8000000000000000,35828298):EFC498336DA763DB2F9D5E27F59B99DA505297B826A207E8387A02C6F2FA3C52:69E123CD31F5AFB89A32EEAA6D1F7589734E1042ECB8D6A5CC7F795972E5FD9E, 0, 27D28A4C04F71995216C0E7CCA34DA08DCCB20836C3CE5119245B339DF102FDD, -2147483648) liteserver query [ 3][t 5][2024-02-02 14:04:00.262529016][liteserver.cpp:79][!litequery] aborted liteserver query: [Error : -503 : timeout] [ 2][t 2][2024-02-02 14:04:00.262663247][adnl-ext-server.cpp:34][!manager] failed ext query: [Error : 651 : node not synced] [ 2][t 2][2024-02-02 14:04:00.262686567][adnl-ext-server.cpp:34][!manager] failed ext query: [Error : 651 : node not synced] [ 2][t 2][2024-02-02 14:04:00.262696887][adnl-ext-server.cpp:34][!manager] failed ext query: [Error : 651 : node not synced] [ 2][t 2][2024-02-02 14:04:00.262705788][adnl-ext-server.cpp:34][!manager] failed ext query: [Error : 651 : node not synced] ^[[ 3][t 3][2024-02-02 14:04:02.325045744][download-archive-slice.cpp:148][!archive] downloading archive slice #35828238 from yfnIJiL2oWKjJHHg7DzGs6IjxLnqWOxuWbUTcYSwrUw=
Is there any recommendation for solving? @EmelyanenkoK @akifoq @XaBbl4 @aleksej-paschenko
I'm more curious about what kind of hard drive you are using now. Sata SSD? M2 SSD? Or a regular hard drive?
Because building a TON archive node actually has high requirements on I/O speed. It is best to use an M2 SSD.
Slow node sync
Here I'm using an archive node without validating, only syncing blocks.
Server characteristics
Server has:
Update flow
After updating from v2023.10 to v2024.01, the node started to lag. The validator-engine is a self-compiled application from the TON repo and was compiled according to TON-repo recommendations. The flow of updating was next:
The node has been working for one-two days without crashing, but it's lagging and can't sync to the latest blocks. It's syncing (as seen from the logs), but abnormally slowly, and the gap between the last block and the synced block is increasing (8 hours ago it was 15k blocks, now it's 16k blocks).
Some logs
Is there any recommendation for solving? @EmelyanenkoK @akifoq @XaBbl4 @aleksej-paschenko