Open nikurt opened 2 years ago
Synced a betanet node. Stopped it.
In 12 hours continued syncing. After downloading headers, the client started syncing state.
The first Ctrl+C was handled, the shutdown was initiated but nothing indicated that the client was actually going to shutdown.
After I minute pressed Ctrl+C multiple times - nothing happened.
Killed from another terminal by doing killall -9 neard
.
$ cargo fmt && cargo build -p neard --release --features nightly_protocol,nightly_protocol_features && RUST_LOG=info ./target/release/neard --home /home/orfest/.near/betanet run
...
...
...
2022-05-12T10:09:43.332151Z INFO stats: #59558457 Downloading headers 99.49% (3277 left; at 59620402) 13 peers ⬇ 511.3kiB/s ⬆ 0.6kiB/s 0.00 bps 0 gas/s CPU: 41%, Mem: 586.3 MiB
2022-05-12T10:09:43.832230Z INFO client: Received 512 block headers from ed25519:6DSjZ8mvsRZDvFqFxo8tCKePG96omXW7eVYVSySmDk8e
2022-05-12T10:09:45.069056Z INFO client: Received 512 block headers from ed25519:9saR916jwNjxPa7RyhesTvmkiYY2iqW6x9Z1eZ3Yc3BB
2022-05-12T10:09:46.748685Z INFO client: Received 512 block headers from ed25519:AtxyywuiieiDCCH7c893rQSx3hrK8X6qr2kkkHTZWkT1
2022-05-12T10:09:47.677098Z INFO client: Received 512 block headers from ed25519:ydgzeXHJ5Xyt7M1gXLxqLBW1Ejx6scNV5Nx2pxFM8su
2022-05-12T10:09:48.546437Z INFO client: Received 512 block headers from ed25519:2fQJ46A2okUhnYG1Zkcgi1kDSPP8fMEmv3c3earyaLL8
2022-05-12T10:09:49.240543Z INFO client: Received 213 block headers from ed25519:BpfDYpNvNTfUDVXp5q2xmvsjsJQwgkP2T6p12aPXKWsv
2022-05-12T10:09:50.729861Z INFO network: Error connecting to addr=3.95.251.187:24567 err=Elapsed(())
^C2022-05-12T10:09:55.226876Z WARN neard: SIGINT, stopping... this may take a few minutes.
2022-05-12T10:09:55.228750Z INFO neard: Waiting for RocksDB to gracefully shutdown
2022-05-12T10:09:55.228766Z INFO db: Waiting for the 1 remaining RocksDB instances to gracefully shutdown
^C^C^C^C^C^C^C^Czsh: killed RUST_LOG=info ./target/release/neard --home /home/user/.near/betanet run
It looks like there’s some kind of leak in Store objects? After getting SIGINT the node stops doing anything it looks like so it seems like actix::System::current().stop();
successfully stopped everything but we’re stuck on RocksDB::block_until_all_instances_are_dropped();
which means that one of the RocksDB instances didn’t get dropped.
Reproduced this issue with lots of extra debug output.
The issue is this loop https://github.com/near/nearcore/blob/master/chain/chain/src/chain.rs#L1225;L1252
During this loop ClientActor
doesn't respond to actix::System::current().stop()
. Iterations in that loop take between 0.01 and 20 milli-seconds.
Resuming a stopped instance may need updating a lot of blocks, which took multiple minutes in my case.
After more testing, state sync itself doesn't seem to be a problem and SIGINTs are properly handled.
neard
handles SIGINT during most of the time, but when the client is downloading state it doesn't stop upon receiving Ctrl+C. Repeated Ctrl+C have no effect. Maybe it's not ignored and will be handled when the state sync finishes, haven't waited that long. Though I waited for more than 1 minute, and nothing happened.Forcing a stop with SIGKILL got me database corruptions.
neard
should respond to signals also during state sync.