Closed tczee36 closed 8 months ago
Please see https://github.com/monero-project/monero/issues/9139
Also getting Segment Fault after auto switching to this longer chain.
How do you know the segfault is related to this longer chain? Can you share a backtrace?
Please see #9139
Also getting Segment Fault after auto switching to this longer chain.
How do you know the segfault is related to this longer chain? Can you share a backtrace?
No segfault issues during sync until it switched to the longer chain. Had to run while loop to keep Monerod from stopping.
Tell me what to do to get the backtrace
It didn't switch to this longer chain, it just logged that a node has sent a new top block condidate. So far I haven't seen another person report that this crashed their node so I'm not sure if it's related to your issue.
Tell me what to do to get the backtrace
Which OS are you using?
It didn't switch to this longer chain, it just logged that a node has sent a new top block condidate. So far I haven't seen another person report that this crashed their node so I'm not sure if it's related to your issue.
Tell me what to do to get the backtrace
Which OS are you using?
This is on Alpine linux, i'm new on this distro
What kind of hardware do you use?
What kind of hardware do you use?
3900x / 16gb ram/ 2.5inch ssd/asrock b550
For a backtrace you need gdb installed, and then execute gdb with the monerod binary
gdb /path/tomonerod
then wait for it to load and enter
start
then monerod should start to sync, wait for it to segfault and enter
thread apply all bt
and share the output.
For a backtrace you need gdb installed, and then execute gdb with the monerod binary
gdb /path/tomonerod
then wait for it to load and enter
start
then monerod should start to sync, wait for it to segfault and enter
thread apply all bt
and share the output.
i blocked the IP with longer chain, problem went away. Unable to reproduce the problem now. everything is synced and working
thanks for the replies
*edit: nevermind, seeing segment fault again, will try to reproduce the problem again.
alpine:~/monero-x86_64-linux-gnu-v0.18.3.1$ gdb monerod GNU gdb (GDB) 14.1 Copyright (C) 2023 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-alpine-linux-musl". Type "show configuration" for configuration details. For bug reporting instructions, please see: https://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from monerod...
(gdb)
Temporary breakpoint 1 at 0xed784
Starting program: /home/xmr/monero-x86_64-linux-gnu-v0.18.3.1/monerod
warning: Unable to find dynamic linker breakpoint function.
GDB will be unable to debug shared library initializers
and track explicitly loaded dynamic code.
process 26557 is executing new program: /lib/ld-musl-x86_64.so.1
Error in re-setting breakpoint 1: Function "main" not defined.
2024-01-30 08:06:15.327 I Monero 'Fluorine Fermi' (v0.18.3.1-release)
2024-01-30 08:06:15.327 I Initializing cryptonote protocol...
2024-01-30 08:06:15.327 I Cryptonote protocol initialized OK
2024-01-30 08:06:15.327 I Initializing core...
2024-01-30 08:06:15.328 I Loading blockchain from folder /home/xmr/.bitmonero/lmdb ...
[New LWP 26560]
[New LWP 26561]
[New LWP 26562]
[New LWP 26563]
[New LWP 26564]
[New LWP 26565]
[New LWP 26566]
[New LWP 26567]
[New LWP 26568]
[New LWP 26569]
[New LWP 26570]
[New LWP 26571]
[New LWP 26572]
[New LWP 26573]
[New LWP 26574]
[New LWP 26575]
[New LWP 26576]
[New LWP 26577]
[New LWP 26578]
[New LWP 26579]
[New LWP 26580]
[New LWP 26581]
[New LWP 26582]
[New LWP 26583]
[New LWP 26584]
2024-01-30 08:06:15.866 I Loading checkpoints
2024-01-30 08:06:15.866 I Core initialized OK
2024-01-30 08:06:15.866 I Initializing p2p server...
2024-01-30 08:06:15.874 I p2p server initialized OK
2024-01-30 08:06:15.874 I Initializing core RPC server...
2024-01-30 08:06:15.874 I Binding on 127.0.0.1 (IPv4):18081
[LWP 26561 exited]
2024-01-30 08:06:16.121 I core RPC server initialized OK on port: 18081
[New LWP 26585]
[New LWP 26586]
[New LWP 26587]
2024-01-30 08:06:16.122 I Starting core RPC server...
[New LWP 26588]
[New LWP 26589]
2024-01-30 08:06:16.123 I core RPC server started ok
[New LWP 26590]
[New LWP 26591]
[New LWP 26592]
2024-01-30 08:06:16.124 I Starting p2p net loop...
[New LWP 26593]
[New LWP 26594]
[New LWP 26595]
[New LWP 26596]
[New LWP 26597]
[New LWP 26598]
[New LWP 26599]
[New LWP 26600]
[New LWP 26601]
[New LWP 26602]
[New LWP 26603]
2024-01-30 08:06:17.125 I
2024-01-30 08:06:17.125 I **
2024-01-30 08:06:17.125 I The daemon will start synchronizing with the network. This may take a long time to complete.
2024-01-30 08:06:17.125 I
2024-01-30 08:06:17.125 I You can set the level of process detailization through "set_log <level|categories>" command,
2024-01-30 08:06:17.125 I where
Thread 39 "ld-musl-x86_64." received signal SIGSEGV, Segmentation fault. [Switching to LWP 26597] 0x00007ffff7226e65 in ?? ()
(gdb)
Thread 52 (LWP 26610 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 51 (LWP 26609 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
--Type
medwait.c:100
Thread 50 (LWP 26608 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 49 (LWP 26607 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 48 (LWP 26606 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 47 (LWP 26605 "ld-musl-x86_64."):
--Type
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 46 (LWP 26604 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 45 (LWP 26603 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 44 (LWP 26602 "ld-musl-x86_64."):
y=
Thread 43 (LWP 26601 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 42 (LWP 26600 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 41 (LWP 26599 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 40 (LWP 26598 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 39 (LWP 26597 "ld-musl-x86_64."):
Thread 38 (LWP 26596 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 37 (LWP 26595 "ld-musl-x86_64."):
Thread 36 (LWP 26594 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 35 (LWP 26593 "ld-musl-x86_64."):
y=
riv@entry=1) at src/thread/__timedwait.c:52
ead_cond_timedwait.c:100
Thread 34 (LWP 26592 "ld-musl-x86_64."):
=
Thread 33 (LWP 26591 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 32 (LWP 26590 "ld-musl-x86_64."):
y=
timized out>) at src/select/select.c:39
Thread 31 (LWP 26589 "ld-musl-x86_64."):
y=
Thread 30 (LWP 26588 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 29 (LWP 26587 "ld-musl-x86_64."):
y=
riv@entry=1) at src/thread/__timedwait.c:52
ead_cond_timedwait.c:100
Thread 28 (LWP 26586 "ZMQbg/IO/0"):
y=
Thread 27 (LWP 26585 "ZMQbg/Reaper"):
y=
Thread 26 (LWP 26584 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 25 (LWP 26583 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 24 (LWP 26582 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 23 (LWP 26581 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 22 (LWP 26580 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 21 (LWP 26579 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 20 (LWP 26578 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 19 (LWP 26577 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 18 (LWP 26576 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 17 (LWP 26575 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 16 (LWP 26574 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 15 (LWP 26573 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 14 (LWP 26572 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 13 (LWP 26571 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 12 (LWP 26570 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 11 (LWP 26569 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 10 (LWP 26568 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 9 (LWP 26567 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 8 (LWP 26566 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 7 (LWP 26565 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 6 (LWP 26564 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 5 (LWP 26563 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 4 (LWP 26562 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 2 (LWP 26560 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
Thread 1 (LWP 26557 "ld-musl-x86_64."):
y=
) at src/thread/__timedwait.c:52
medwait.c:100
(gdb)
Can you use paste.debian.net to share the backtrace? Also is this the full log? I'm specifically looking for thread 39, it seems to be missing from your comment.
Can you use paste.debian.net to share the backtrace? Also is this the full log? I'm specifically looking for thread 39, it seems to be missing from your comment.
sry about that, some parts of the log got cut off. full log posted here http://paste.debian.net/1305799/ also updated the log comment above thanks!
Can you make sure everything is updated in your alpine? I see a similar error due to ABI compatibility.
And what version of Alpine are you using?
Can you make sure everything is updated in your alpine? I see a similar error due to ABI compatibility.
And what version of Alpine are you using?
$ cat /etc/alpine-release 3.19.1 $ uname -r 6.6.14-0-lts
Great. I downloaded the exact version and sync'ed monerod with my local node. But was not able to reproduce. How familiar are you with package compilation in Alpine? Can you build a debug monero package? I am not familiar with alpine at all, I did a quick search and saw this [1].
Great. I downloaded the exact version and sync'ed monerod with my local node. But was not able to reproduce. How familiar are you with package compilation in Alpine? Can you build a debug monero package? I am not familiar with alpine at all, I did a quick search and saw this [1].
1. https://che-adrian.medium.com/how-to-cross-compile-alpine-linux-apk-packages-fae8a75aee88
Hi, thanks for the response. I'm not familiar at all with package compilation in Alpine. The link medium link is lacking some details on creating a debug monero package. Please provide few more tips, and i'll try to get it done.
Great. I downloaded the exact version and sync'ed monerod with my local node. But was not able to reproduce. How familiar are you with package compilation in Alpine? Can you build a debug monero package? I am not familiar with alpine at all, I did a quick search and saw this [1].
1. https://che-adrian.medium.com/how-to-cross-compile-alpine-linux-apk-packages-fae8a75aee88
Hi, thanks for the response. I'm not familiar at all with package compilation in Alpine. The link medium link is lacking some details on creating a debug monero package. Please provide few more tips, and i'll try to get it done.
If we change this line [1]:
-DCMAKE_BUILD_TYPE=None \
to
-DCMAKE_BUILD_TYPE=Debug \
that would at least generate better debugging information when you are debugging it.
It seems the official link for how to build APKBUILD
packages is here [2]. The overall steps are quite simple IMHO. Since we are building just for the sake of debugging, I believe you can skip most of the signature/verification steps.
In the meantime, I left my AlpineOS vm running, but so far no luck. If you are using any specific flags or config to run monerod
please let me know.
Strange, I see the same behavior.
I [165.232.190.164:50514 INC] Sync data returned a new top block candidate: 3076846 -> 3493679 [Your node is 416833 blocks (1.6 years) behind]
? 3493679
@Haraade you can ignore it, it's just a node sending false data. It's harmless.
switched to debian, problem solved itself. i'd avoid alpine linux for now
I am closing this issue as it looks like it is alpine issue.
My guess is the pthread stack size is too small as musl (not Alpine) has a much lower default than glibc.
The (presumed) fix is for Monero to explicitly increase its stack size if the system default is presumably too low.
This is monerod failing to run on a widely used environment. While we can declare the environment at fault (an entire libc which has a lot of reasons to use it), I'm affected and would like monerod + musl to work as expected.
My guess is the pthread stack size is too small as musl (not Alpine) has a much lower default than glibc.
The (presumed) fix is for Monero to explicitly increase its stack size if the system default is presumably too low.
This is monerod failing to run on a widely used environment. While we can declare the environment at fault (an entire libc which has a lot of reasons to use it), I'm affected and would like monerod + musl to work as expected.
What are the steps that reproduce this bug?
Run monerod on Alpine.
If you have a rootless Docker and Rust toolchain, the following will do that:
git clone https://github.com/serai-dex/serai
cd serai
git checkout f0694172ef2cdf7dfde0d286e693243e4bdcacca
cargo run -p serai-orchestrator -- key_gen testnet
cargo run -p serai-orchestrator -- setup testnet
cargo run -p serai-orchestrator -- start testnet monero-daemon
This will create a key in a file under ~/.serai
, generate some Dockerfiles within the directory, and spawn a container for monerod. It will start no other services nor run any binaries other than serai-orchestrator
and docker
.
The container should SIGSEGV, presumably due to the pthread stack size, within a few minutes (<30, I'd expect, yet I think likely as soon as 5-10).
Effectively all of users complained of this, and @j-berman can confirm trivial replication. While we've moved to Debian, that has an increased surface, increased memory requirements, and slower bootup times. This isn't specific to Serai either as Alpine is largely preferred for Docker containers.
Alpine is also a Linux distro not exclusive to Docker, so this does potentially have impact to personal machines. If it is the theorized issue (pthread stack size defaults), this actually effects all musl systems.
Run monerod on Alpine.
If you have a rootless Docker and Rust toolchain, the following will do that:
git clone https://github.com/serai-dex/serai cd serai git checkout f0694172ef2cdf7dfde0d286e693243e4bdcacca cargo run -p serai-orchestrator -- key_gen testnet cargo run -p serai-orchestrator -- setup testnet cargo run -p serai-orchestrator -- start testnet monero-daemon
This will create a key in a file under
~/.serai
, generate some Dockerfiles within the directory, and spawn a container for monerod. It will start no other services nor run any binaries other thanserai-orchestrator
anddocker
.The container should SIGSEGV, presumably due to the pthread stack size, within a few minutes (<30, I'd expect, yet I think likely as soon as 5-10).
Effectively all of users complained of this, and @j-berman can confirm trivial replication. While we've moved to Debian, that has an increased surface, increased memory requirements, and slower bootup times. This isn't specific to Serai either as Alpine is largely preferred for Docker containers.
Alpine is also a Linux distro not exclusive to Docker, so this does potentially have impact to personal machines. If it is the theorized issue (pthread stack size defaults), this actually effects all musl systems.
Thanks, I ran and synced the entire mainnet blockchain on Alpine and didn't have this issue [1]. If you have specific steps that reproduce this issue I am happy to take a look at it.
Given that effectively every participant I've had has reported the SIGSEGV, that's my current recommendation. I'll also note that configuration doesn't sync the mainnet blockchain and does have a variety of CLI flags.
Hi,
trying to sync a pruned node, getting some weird log output.
2024-01-29 20:01:36.247 I [207.244.240.82:18080 OUT] Sync data returned a new top block candidate: 3067647 -> 3072795 [Your node is 5148 blocks (7.2 days) behind] 2024-01-29 20:01:36.248 I SYNCHRONIZATION started 2024-01-29 20:01:37.555 I [110.40.229.103:18080 OUT] Sync data returned a new top block candidate: 3067647 -> 3344842 [Your node is 277195 blocks (1.1 years) behind]
There seems to be another chain, and its ahead for 1.1 years? (3344842 blocks)
I know this cannot be right, because the current chain is only 30676XX blocks
Feels like an attacker trying to disrupt network stability.
Also getting Segment Fault after auto switching to this longer chain.
Is the DNS-blacklist not working properly?