Closed shenyaqi9527 closed 4 years ago
Please cut and paste the actual log output rather than including a screenshot. Screenshots are unreadable for people with high DPI monitors.
[cardano-sl.*production*:Info:ThreadId 382] [2020-01-17 07:31:07.48 UTC] Trying to apply blocks w/o rollback. First 3: [MainBlockHeader:
hash: 19b1f1eec6f9abb145114bfeda8cad76f2ff9fda4ff3e4ccdaf369f4052f9c3b
previous block: 1b3c31eb41b0d38af18d9cf908a7c1848911d081271fb29b88d5b010932b2eba
slot: 8364th slot of 152nd epoch
difficulty: 3290025
leader: pub:993a8f05
signature: BlockPSignatureHeavy: Proxy signature { psk = ProxySk { w = #0, iPk = pub:993a8f05, dPk = pub:89c29f8c } }
block: v0.2.0
software: cardano-sl:1
, MainBlockHeader:
hash: a7b1b58758880db395796ce8b8cba290de717097a3a1de5b50e0a9923a2941f0
previous block: 19b1f1eec6f9abb145114bfeda8cad76f2ff9fda4ff3e4ccdaf369f4052f9c3b
slot: 8365th slot of 152nd epoch
difficulty: 3290026
leader: pub:0bdb1f5e
signature: BlockPSignatureHeavy: Proxy signature { psk = ProxySk { w = #0, iPk = pub:0bdb1f5e, dPk = pub:5fddeeda } }
block: v0.2.0
software: cardano-sl:1
, MainBlockHeader:
hash: f25b190e0f961f05b111952b72a8cba6b30cffa4caac4c60eda64697471dc606
previous block: a7b1b58758880db395796ce8b8cba290de717097a3a1de5b50e0a9923a2941f0
slot: 8366th slot of 152nd epoch
difficulty: 3290027
leader: pub:1bc97a2f
signature: BlockPSignatureHeavy: Proxy signature { psk = ProxySk { w = #0, iPk = pub:1bc97a2f, dPk = pub:61261a95 } }
block: v0.2.0
software: cardano-sl:1
]
Last 3: [MainBlockHeader:
hash: 197d5cfea25e990f6893e1250ea248ac25c0465db43abef95677b32c0d3ebbff
previous block: 14b798e52f1b215d77b8be0ff1315ca25885f37dc27cacb7ac6db897d877f8a1
slot: 8425th slot of 152nd epoch
difficulty: 3290086
leader: pub:9a6fa343
signature: BlockPSignatureHeavy: Proxy signature { psk = ProxySk { w = #0, iPk = pub:9a6fa343, dPk = pub:8b532076 } }
block: v0.2.0
software: cardano-sl:1
, MainBlockHeader:
hash: 5a0795022c4786191d90eaf83f0a58c927d399a4779b1a61b78a0188de439c3a
previous block: 197d5cfea25e990f6893e1250ea248ac25c0465db43abef95677b32c0d3ebbff
slot: 8426th slot of 152nd epoch
difficulty: 3290087
leader: pub:0bdb1f5e
signature: BlockPSignatureHeavy: Proxy signature { psk = ProxySk { w = #0, iPk = pub:0bdb1f5e, dPk = pub:5fddeeda } }
block: v0.2.0
software: cardano-sl:1
, MainBlockHeader:
hash: d571750aee77c352ae4a3be20b1f229d4e3d6c549668a06b2a19b3b8bc301843
previous block: 5a0795022c4786191d90eaf83f0a58c927d399a4779b1a61b78a0188de439c3a
slot: 8427th slot of 152nd epoch
difficulty: 3290088
leader: pub:0bdb1f5e
signature: BlockPSignatureHeavy: Proxy signature { psk = ProxySk { w = #0, iPk = pub:0bdb1f5e, dPk = pub:5fddeeda } }
block: v0.2.0
software: cardano-sl:1
]
[cardano-sl.*production*:Debug:ThreadId 382] [2020-01-17 07:31:07.48 UTC] MemPool metrics wait: ApplyBlock queue length is 1
[cardano-sl.*production*:Debug:ThreadId 382] [2020-01-17 07:31:07.48 UTC] MemPool metrics acquire: ApplyBlock wait time was 12mcs
[cardano-sl.*production*:Info:ThreadId 382] [2020-01-17 07:31:07.48 UTC] Verifying and applying blocks...
[cardano-sl.*production*:Debug:ThreadId 382] [2020-01-17 07:31:07.48 UTC] Rolling: verifying
[cardano-sl.*production*:Debug:ThreadId 382] [2020-01-17 07:31:07.48 UTC] verifyBlocksPrefix: 64
[cardano-sl.*production*:Info:ThreadId 382] [2020-01-17 07:31:07.48 UTC] slogVerifyBlocks: Consensus era is Original
[cardano-sl.*production*:Debug:ThreadId 382] [2020-01-17 07:31:07.71 UTC] Rolling: Verification done, applying unsafe block
[cardano-sl.node:Debug:ThreadId 316] [2020-01-17 07:31:07.72 UTC] applying some blocks (non-rollback)
[cardano-sl.*production*:Info:ThreadId 382] [2020-01-17 07:31:07.74 UTC] Verifying and applying blocks done
[cardano-sl.*production*:Debug:ThreadId 382] [2020-01-17 07:31:07.74 UTC] MemPool metrics release: ApplyBlock modify time was 258859mcs size is 0
[cardano-sl.*production*:Debug:ThreadId 382] [2020-01-17 07:31:07.74 UTC] Not relaying block in recovery mode
[cardano-sl.*production*:Info:ThreadId 382] [2020-01-17 07:31:07.74 UTC] Blocks have been adopted: [19b1f1eec6f9abb1, a7b1b58758880db3, f25b190e0f961f05, 0e907e6cf3f6ca4d, d4f74c5d312b0549, e4d06fea6f988ce9, ee279270b61135e9, 6c7998c9ba54d17e, 94853e29ce7200e6, f08ed74a4fcfd8e3, 834b772bd7901ffb, fb514730332114d2, c8a4b3a5697044ee, 5c9a27718384810e, 98573eeb0e53dbd4, a3be980a5027b156, 3713b244551e732b, ff9fa80350026e9e, 1704bdb34a16c6be, 229b40d03c219294, dfcbec3a426b4ca3, 860e4a48c46e67fa, 40b75fe17d58e8bf, faddbd98928a5527, 7b1d29bc099b5b19, ab08ff7a02399e92, ae699dcf30f5ef08, a6d99576b5002d9a, 62b679c913e1c9ee, 4d01d5340c42fdb3, 196d33de29e35ae9, 98ee850aac9a1b9d, 4f3037f91ccd71fd, 24fbc16a4479d02f, 53ac78eb4d6ab7f2, d48b6feb41138da5, 676d449b1ca1fb74, 3327651659423dee, 1a3c9a6c2f095cd7, 16b36ed4da3e05ff, 2223acc65da5f5db, e8d4e12a616f4227, 443ff6e4fffeac0b, 22db605e0059c4d8, 643d9645b54e09b5, fdf7bfe14b34aed0, 6a1650622dec4d07, 1af30b82a5ee01a8, 1e585328b2eda019, db55785d3ea21869, 0ac37f9d03244062, cc9d0a297ea9af53, ebffb3fc15d7b62b, e7b81541b23ace3a, a48604d65cef0180, 7af2bd4c948d283f, 013dc0ea380f67a1, de88e7e0c354e93a, a762148ba5fbf0a6, 432652c976cbbdc4, 14b798e52f1b215d, 197d5cfea25e990f, 5a0795022c478619, d571750aee77c352]
^Z[9] Segmentation fault nohup sudo ./mainnet.sh > log.log 2>&1 (wd: /data/ada/cardano-sl)
(wd now: /data/ada/cardano-sl/state-wallet-mainnet/logs/pub)
What version (use the git hash) is this?
Is this repeatable? Ie, it you run it again, do you get the same error?
master 1a792d7cd [origin/master] Merge #4242 yes,this is repeatable @erikd
That git hash is from Sep 2019.
Try checking out tag 3.2.0 which is from Nov 22, 2019.
If that does not fix it, I will look at this on Monday morning. Its currently 9pm Friday here.
thank you @erikd
@erikd I use "nix-build -A connectScripts.mainnet.wallet -o mainnet.sh" command to build. mainnet.sh:
set -euo pipefail
if [[ "${1-}" == "--delete-state" ]]; then echo "Deleting state-wallet-mainnet ... " rm -Rf state-wallet-mainnet shift fi if [[ "${1-}" == "--runtime-args" ]]; then RUNTIME_ARGS="${2-}" shift 2 else RUNTIME_ARGS="" fi
echo "Keeping state in state-wallet-mainnet" mkdir -p state-wallet-mainnet/logs
echo "Launching a node connected to 'mainnet' ..." export LC_ALL=en_GB.UTF-8 export LANG=en_GB.UTF-8
if [ ! -d state-wallet-mainnet/tls ]; then mkdir -p state-wallet-mainnet/tls/server && mkdir -p state-wallet-mainnet/tls/client /nix/store/ra45xgy1ngy9bpn12h5fib7m81925i80-cardano-sl-tools-3.2.0-exe-cardano-x509-certificates/bin/cardano-x509-certificates \ --server-out-dir state-wallet-mainnet/tls/server \ --clients-out-dir state-wallet-mainnet/tls/client \ --configuration-file /nix/store/r02jsbcld1cmy47y1cxr8c9l6y9z7a8n-tls-config-mainnet.yaml \ --configuration-key mainnet_full fi ln -sf /nix/store/0gzajk6rskv7xigvwhgly1zrn3m75d4r-curl-wallet-mainnet state-wallet-mainnet/curl
exec /nix/store/ya8iqz0l34w9mszd06ir3pchasryqz4a-cardano-wallet-3.2.0-exe-cardano-node/bin/cardano-node \ --configuration-file /nix/store/j4rz117v3paa3ys3abfkxacvghyd7chn-cardano-sl-config/lib/configuration.yaml --configuration-key mainnet_full \ --tlscert state-wallet-mainnet/tls/server/server.crt \ --tlskey state-wallet-mainnet/tls/server/server.key \ --tlsca state-wallet-mainnet/tls/server/ca.crt \ --log-config /nix/store/j4rz117v3paa3ys3abfkxacvghyd7chn-cardano-sl-config/log-configs/connect-to-cluster.yaml \ --topology "/nix/store/kiwxslk8q90j8rrjj4vqnnc9np5a9bhy-topology-mainnet" \ --logs-prefix "state-wallet-mainnet/logs" \ --db-path "state-wallet-mainnet/db" \ --wallet-db-path 'state-wallet-mainnet/wallet-db' \ --no-client-auth \ \ --keyfile state-wallet-mainnet/secret.key \ --wallet-address 0.0.0.0:8090 \ --wallet-doc-address 127.0.0.1:8091 \ --ekg-server 127.0.0.1:8000 --metrics \ +RTS -N2 -qg -A1m -I0 -T -RTS \ \ $RUNTIME_ARGS
I use "nix - build - A connectScripts.mainnet.wallet - o mainnet.sh" command to build.
There seem to be some extra spaces around the -
character in that. It should be nix-build
-A
and -o
.
I just did:
> git checkout 3.2.0 -b tag-3.2.0
> nix-build -A connectScripts.mainnet.wallet -o mainnet.sh
and it worked as expected.
@erikd That's probably why I copied it.It added the space for me automatically.
The process ended abruptly. same error
[cardano-sl.*production*:Debug:1689] [2020-01-20 02:54:16.92 UTC] Rolling: verifying
[cardano-sl.*production*:Debug:1689] [2020-01-20 02:54:16.92 UTC] verifyBlocksPrefix: 64
[cardano-sl.*production*:Info:1689] [2020-01-20 02:54:16.92 UTC] slogVerifyBlocks: Consensus era is Original
^Z[5] Segmentation fault sudo nohup ./mainnet.sh > log.log 2>&1
[6]+ Stopped tail -200f log.log
Do I need to upgrade my server? now my server : 2c 4GB RAM
4G should be enough RAM, especically if nothing else is happening on that machine.
I am currently running the ./mainnet.sh
script. Any idea what epoch you are getting up to when it segfaults?
And now it segfaults for me too! At 5046th slot of 3rd epoch
.
I think this error occurs when the process reaches 1gb of memory.Because I restarted the process it can continue to synchronize.I rebooted once, now at 10365th slot of 13th epoch
I am running on a 16G VM, and I was able to recreate that problem so that is not it.
Oh, hang on, you are running on a 64 bit CPU aren't you?
I also checked out the HEAD
of the develop
branch and that synced to epoch 4 without a problem.
Then I switched back to the 3.2.0
tag ans synced from scratch to epoch 5, again without a problem.
I wonder if there is a peer somewhere on the network that is serving up corrupted blocks.
admin@ada:/data/ada$ getconf LONG_BIT 64
On Friday, I restarted the node an infinite number of times and finished synchronizing. On Monday, the process was killed.
What base OS and OS version are you running this on?
Ubuntu 18.04
Ubuntu 18.04 should be fine.
I would try deleting the ./state-wallet-mainnet
directory, and then try resyncing. Each time it segfaults, record the slot and epoch number and restart it. When you have about 10 entries, post the list here.
8280th slot of 8th epoch 18274th slot of 18th epoch 20293rd slot of 19th epoch 7566th slot of 31st epoch 1618th slot of 33rd epoch 4089th slot of 36th epoch 15837th slot of 55th epoch
Ok, if you delete the ./state-wallet-mainnet
directory and run it again listing the first 10 entries here.
What do you mean?
Delete ./state-wallet-mainnet
directory and do the same test again. Would be useful to know if we get the same results.
Do I need to delete this directory when I run it again
Yes. Thats is the state directory where the node stores blocks.
[cardano-sl.*production*:Debug:1689] [2020-01-20 04:56:29.75 UTC] Handling block w/ LCA, which is a07d3104
[cardano-sl.*production*:Info:1689] [2020-01-20 04:56:29.75 UTC] Trying to apply blocks w/o rollback. First 3: [MainBlockHeader:
hash: 92d68c2ba61d115b3c53f1c857c77355062c2ee76680e8afed94980e7fbba239
previous block: a07d310498f2417e1d6ade2dd2e3da8f802995407845d164616053dfc683a44d
slot: 18033rd slot of 26th epoch
difficulty: 579559
leader: pub:9a6fa343
signature: BlockPSignatureHeavy: Proxy signature { psk = ProxySk { w = #0, iPk = pub:9a6fa343, dPk = pub:8b532076 } }
block: v0.1.0
software: cardano-sl:0
, MainBlockHeader:
hash: c5342a48f472640576a177d2992b77f7939e406e463d00bc480ddb339747e85f
previous block: 92d68c2ba61d115b3c53f1c857c77355062c2ee76680e8afed94980e7fbba239
slot: 18034th slot of 26th epoch
difficulty: 579560
leader: pub:0bdb1f5e
signature: BlockPSignatureHeavy: Proxy signature { psk = ProxySk { w = #0, iPk = pub:0bdb1f5e, dPk = pub:5fddeeda } }
block: v0.1.0
software: cardano-sl:0
, MainBlockHeader:
hash: d8462f486a46f0688786e3880c7982a6f69ce8f0adcf18c1e1089bb9480b7490
previous block: c5342a48f472640576a177d2992b77f7939e406e463d00bc480ddb339747e85f
slot: 18035th slot of 26th epoch
difficulty: 579561
leader: pub:9a6fa343
signature: BlockPSignatureHeavy: Proxy signature { psk = ProxySk { w = #0, iPk = pub:9a6fa343, dPk = pub:8b532076 } }
block: v0.1.0
software: cardano-sl:0
The log is not finished. The process is killed.But this time no errors were reported.
@shenyaqi9527 I need a list of where (ie epoch and slot number) the process gets killed, started from scratch. Please run it again so I can compare it with the last list.
Do I need to delete "./state-wallet-mainnet"
[cardano-sl.*production*:Debug:1686] [2020-01-20 05:11:51.96 UTC] verifyBlocksPrefix: 64
[cardano-sl.*production*:Info:1686] [2020-01-20 05:11:51.96 UTC] slogVerifyBlocks: Consensus era is Original
cardano-node: internal error: evacuate: strange closure type 0
(GHC version 8.4.4 for x86_64_unknown_linux)
Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug
I am really beginning to suspect that your machine is having hardware issues. Can you run some form of diagnostic on it?
How to diagnose? This is the server of aliyun.
I would try memtext86+
first and then contact your provider.
As a point of reference, I have seen this issue exactly once. I have since restarted and synced to the 135th epoch (and its still going) without a recurrence of the segfault.
I changed the server and got the same error
What is the git hash?
tag-3.2.0 5d0a227fb Merge #4252 Are there any commands that need to be executed before creation?
No commands require other than what you have been running.
Mine is running quite happily using up to about 15% of my RAM on a 16G rVM.
Maybe try running it on an 8G or 16G machine.
Does nix limit memory usage?I used 8GB of memory with the same result. slot: 7832nd slot of 8th epoch.
Does nix limit memory usage?
I don't think so.
I would still like a list of the epoch/slot info for the first 10 failures.
I tried a few times 18035th slot of 26th epoch 6841st slot of 3rd epoch 16250th slot of 3rd epoch 15769th slot of 4th epoch 20023rd slot of 1st epoch 7832nd slot of 8th epoch
So its not deterministic.
When you moved machines, did you keep the same disk image or reinstall from scratch?
reinstall from scratch
I have just about run out of ideas :cry: .
I can't do anything about it now.
@shenyaqi9527 Does the dmesg
output on your machine list any segfaults?
How do I use this command
sudo dmesg | grep segfault
admin@ada:/$ sudo dmesg | grep segfault
[ 1547.424449] cardano-node:w[5554]: segfault at 840bffe2a0 ip 00007fae70921c55 sp 00007fae6d919a98 error 6 in libc-2.27.so[7fae707d0000+1aa000]
[ 1910.998425] cardano-node:w[5786]: segfault at 8402c16940 ip 00007fca29cefc55 sp 00007fca05bd4a98 error 6 in libc-2.27.so[7fca29b9e000+1aa000]
[ 1991.020628] cardano-node:w[5883]: segfault at 84044bfac0 ip 00007fc7aee9ec55 sp 00007fc77e7f7a98 error 6 in libc-2.27.so[7fc7aed4d000+1aa000]
[ 2427.408873] cardano-node:w[5942]: segfault at 840c2d55c0 ip 00007ff7f4bcfc55 sp 00007ff7dbffaa98 error 6 in libc-2.27.so[7ff7f4a7e000+1aa000]
I got two more in the last 30 minutes. I am seeing something similar to you:
[6124226.626857] cardano-node:w[13755]: segfault at 840554dfc0 ip 00007fa4abdd9d6e sp 00007fa49bffaa98 error 6 in libc-2.27.so[7fa4abc88000+1aa000]
[6138577.260721] cardano-node:w[24086]: segfault at 84079fd1d0 ip 00007f5a12efeb24 sp 00007f5a0a7f7a98 error 6 in libc-2.27.so[7f5a12dad000+1aa000]
[6139714.043880] cardano-node:w[24190]: segfault at 84009cd480 ip 00007fb987390d6e sp 00007fb97e7f7a98 error 6 in libc-2.27.so[7fb98723f000+1aa000]
I'm running this on Debian and I have just noticed there is a libc-2.29
available, so I am going to try upgrading to that.
What am I going to do?
Wait for me to report back after I do a complete upgrade of my system, reboot and retest?
What caused the mistake? After running for a while, the error is reported and the process is killed. Then I ran it again and was able to synchronize, but then the error occurred again.Is this the memory limit? This error occurs when the memory reaches 1GB.