nervosnetwork / ckb

The Nervos CKB is a public permissionless blockchain, and the layer 1 of Nervos network.
https://www.nervos.org
MIT License
1.15k stars 228 forks source link

CKB Process Occasionally Fails to Terminate #4607

Closed sunchengzhu closed 2 weeks ago

sunchengzhu commented 3 weeks ago

Bug Report

Current Behavior

I restart CKB every two hours and I've noticed that there are occasional issues with missing metrics data.

Terminate command:

killckb() {
    PIDS=$(sudo lsof -ti:8114)
    for i in $PIDS; do
        echo "killed the ckb $i"
        sudo kill $i
    done
}

killckb
image image

Expected Behavior

metrics data is not lost

Environment

Additional context/Screenshots

My ckb.toml configuration:

[metrics.exporter.prometheus]
target = { type = "prometheus", listen_address = "0.0.0.0:8100" }

# # Experimental: Monitor memory changes.
[memory_tracker]
# # Seconds between checking the process, 0 is disable, default is 0.
interval = 5
15168316096 commented 3 weeks ago

This is an intermittent issue.

The situation you described indicates that other nodes running ckb v118 did not experience missing metrics during the same time period. If the data were not collected from the source, it would suggest that the ckb process was terminated. However, based on the ckb logs we reviewed earlier, it shows that the service restarted successfully and continued to sync normally.

Therefore, it is more likely that the issue occurred when Prometheus was scraping data from ckb.

sunchengzhu commented 2 weeks ago

From the run.log, it can be seen that the CKB process did not exit promptly:

  1. Received the Ctrl-C signal at 15:20.
  2. At 15:25, an attempt to start CKB resulted in an ERROR.
  3. The CKB shutdown log was not seen until 15:26.
sunchengzhu commented 2 weeks ago

This issue has been verified and resolved in PR 4615.