hyperledger / besu

An enterprise-grade Java-based, Apache 2.0 licensed Ethereum client https://wiki.hyperledger.org/display/besu
https://www.hyperledger.org/projects/besu
Apache License 2.0
1.51k stars 835 forks source link

besu --help hangs, SIGSEGV in child process #7267

Open thorstenhirsch opened 4 months ago

thorstenhirsch commented 4 months ago

Running besu 24.6.0 on RHEL 8 doesn't seem to work reliably. I only want to call besu --help, but it hangs most of the time (no output at all). Only sometimes the call runs successful.

I've traced the problem with strace and whenever the call hangs, it's keeps doing these things in a loop:

[pid  8217] getrusage(RUSAGE_THREAD, {ru_utime={tv_sec=0, tv_usec=106916}, ru_stime={tv_sec=0, tv_usec=1916}, ...}) = 0
[pid  8217] futex(0x7fb440115efc, FUTEX_WAIT_PRIVATE, 0, {tv_sec=0, tv_nsec=49999413} <unfinished ...>
[pid  8219] <... futex resumed>)        = -1 ETIMEDOUT (Connection timed out)
[pid  8219] futex(0x7fb447918ca8, FUTEX_WAKE_PRIVATE, 1) = 0
[pid  8219] futex(0x7fb447918cf8, FUTEX_WAIT_PRIVATE, 0, {tv_sec=0, tv_nsec=49999813} <unfinished ...>
[pid  8217] <... futex resumed>)        = -1 ETIMEDOUT (Connection timed out)
[pid  8217] futex(0x7fb440115ea8, FUTEX_WAKE_PRIVATE, 1) = 0
[pid  8217] getrusage(RUSAGE_THREAD, {ru_utime={tv_sec=0, tv_usec=106948}, ru_stime={tv_sec=0, tv_usec=1916}, ...}) = 0
[pid  8219] <... futex resumed>)        = -1 ETIMEDOUT (Connection timed out)
[pid  8217] futex(0x7fb440115efc, FUTEX_WAIT_PRIVATE, 0, {tv_sec=0, tv_nsec=49999618} <unfinished ...>
[pid  8219] futex(0x7fb447918ca8, FUTEX_WAKE_PRIVATE, 1) = 0
[pid  8219] futex(0x7fb447918cf8, FUTEX_WAIT_PRIVATE, 0, {tv_sec=0, tv_nsec=49999928}) = -1 ETIMEDOUT (Connection timed out)
[pid  8217] <... futex resumed>)        = -1 ETIMEDOUT (Connection timed out)
[pid  8219] futex(0x7fb447918ca8, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid  8217] futex(0x7fb440115ea8, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>

It seems like besu wants to connect to something that is not there. And the cause might be:

[pid  3979] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x14} ---

So there's one child process that had run on SIGSEGV. Maybe this child would have provided the server, to which the looping besu process wants to connect to.

The system is an up-to-date RHEL 8.10 (x64) with kernel 4.18 running as a virtual machine with 2 vCPU and 16GB RAM. Java version is OpenJDK 21.0.3+9.

thorstenhirsch commented 4 months ago

The SIGSEGV does not seem to cause besu to hang, because I also see the SIGSEGV on other Linux machines, where besu runs successful. So the question is: What else is causing besu to hang on this machine?

edit: Older versions of besu also keep hanging on this machine, I reproduced it with all versions down to 24.1.2. Downgrading to OpenJDK 17 had no effect. I increased "open files" (ulimit, hard+soft) from 1024 to 4096, but the problem still remains the same.

thorstenhirsch commented 4 months ago

No solution, yet, but at least a bit more info:

So is it a problem accessing the sys fs that keeps besu in the loop for ~5min?

siladu commented 4 months ago

Hi @thorstenhirsch thanks for the report.

I just tested this on a fresh Azure box with 2 CPU and 8GB running RHEL 8, java 21 and besu 24.6.0 and don't get the hang.

uname -a
Linux besu-rhel8 4.18.0-425.3.1.el8.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
java -version
openjdk version "21.0.3" 2024-04-16 LTS
OpenJDK Runtime Environment (Red_Hat-21.0.3.0.9-1) (build 21.0.3+9-LTS)
OpenJDK 64-Bit Server VM (Red_Hat-21.0.3.0.9-1) (build 21.0.3+9-LTS, mixed mode, sharing)
time ./bin/besu --help
...
real    0m4.832s
user    0m8.690s
sys 0m0.297s

Are you seeing a similar hang for any other besu commands? Another one you could try is besu --version

 time ./bin/besu --version
besu/v24.6.0/linux-x86_64/openjdk-java-21

real    0m4.222s
user    0m7.597s
sys 0m0.296s

Any problems running besu itself?

siladu commented 4 months ago

It might be interesting to try the docker install as well to compare https://besu.hyperledger.org/public-networks/get-started/install/run-docker-image

Can you provide any details on how you installed besu? Also, anything else that is running on the machine that might be relevant? Any custom Java/JVM settings?

siladu commented 4 months ago

Hi @thorstenhirsch, since we couldn't recreate and haven't heard back, we will close this issue in a couple of days.

thorstenhirsch commented 4 months ago

I was on vacation.

It's really weird that the problem doesn't always occur, sometimes (not very often) besu just runs as it should. For example: I just run besu --help in 4 seconds (just like you), but then I called besu --version and it still hangs since 3 minutes.

dragoonduel commented 3 months ago

i am having the same issue as well. however for my case, the besu just hangs forever.

installation was extracting tar.gz and setting java to OpenJdk22 path. i have tried both 24.6.0 and 24.7.1 both, having the same issue no custom java settings. using aws ec2 t2.micro rhel 8 started besu with custom config, the process just hangs without any logs. cant even use --help or --version