Open SjonHortensius opened 9 months ago
If you have debug logs from this machine during the OOM (check $datadir/beacon/logs
) please DM them to me on Discord (@sproul
) or email them to me ($surname@sigmaprime.io
)
It may be that the message dequeueing isn't happening fast enough, so https://github.com/sigp/lighthouse/pull/5175 will help.
@SjonHortensius I've just noticed that the RSS for all of these crashes is in the 7GB range. You can ignore the higher total-vm
number, that's not relevant.
I think this is probably still a bug on the Lighthouse side, we're looking into it. Logs would be great.
@michaelsproul you're right wrt the memory usage, I misinterpreted those.
I have relevant logs - but I am unwilling to publish them unscrubbed. I'll send some parts through mail
Execution layer: Erigon Network: Mainnet
Lighthouse params:
"--debug-level=info",
"--datadir=/beacondata",
"--network=mainnet",
"beacon_node",
"--disable-enr-auto-update",
"--enr-address=127.0.0.1",
"--enr-tcp-port=9000",
"--enr-udp-port=9000",
"--port=9000" ,
"--discovery-port=9000",
"--eth1",
"--http",
"--http-address=0.0.0.0",
"--http-port=5052",
"--metrics",
"--metrics-address=0.0.0.0",
"--metrics-port=5054",
"--listen-address=0.0.0.0",
"--target-peers=100",
"--http-allow-sync-stalled",
"--disable-packet-filter",
"--execution-endpoint=http://localhost:9545",
"--jwt-secrets=/tmp/jwtsecret",
"--disable-deposit-contract-sync",
"--checkpoint-sync-url=https://beaconstate-mainnet.chainsafe.io"
Adding here info from my personal case. Apart from memory spikes, I also see CPU ones (maybe they are related):
Since we upgraded to v4.6.0, cpu/memory spikes increased a lot
We upgraded to v5.0.0 yesterday but spikes are still around (I pointed in the image when the upgrade was done)
I think that the memory/cpu issue is not already fixed cc. @michaelsproul @AgeManning
@luarx could you share some of your debug logs? feel free to ping me over discord
@pawanjay176 discord user?
I'm pawan#7432. Should be able to find me on the SigmaPrime role on our discord
Description
I realize
4.6.0
contains #4918 a fix for a previous oom issue (which I never experienced) but ever since I upgraded, I've been getting OOMs with some pretty big numbers (between 20 and 50 GiB used) making my setup highly unstableVersion
latest stable
Lighthouse v4.6.0-1be5253
Present Behaviour
I don't think my bn setup includes anything special but fwiw
/usr/bin/lighthouse -d /var/lib/lighthouse/beacon beacon_node --validator-monitor-auto --checkpoint-sync-url http://XXX:5052 --staking --port 9000 --http-port 5052 --http-address 0.0.0.0 --execution-endpoint http://127.0.0.1:8551 --execution-jwt /var/lib/lighthouse/beacon/jwtsecret --builder http://localhost:18550 --builder-profit-threshold XXX
Frequent OOMs, roughly 5-10 per day with varying amounts allocated:
Steps to resolve
Please describe the steps required to resolve this issue, if known.