redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.73k stars 591 forks source link

Redpanda crashes upon start on arm64+Debian Bullseye 11.5 (Raspberry Pi) #8121

Open sloppycoder opened 1 year ago

sloppycoder commented 1 year ago

Version & Environment

Redpanda version: v22.3.10 (rev 1f78ad9)

I was running on Raspberry PI CM4 with 8GB RAM running Pi OS Debian 11.6, kernel version ```Linux pie9 5.15.84-v8+ #1613 SMP PREEMPT Thu Jan 5 12:03:08 GMT 2023 aarch64 GNU/Linux````

Curiously, on the same board, it works fine if I use Ubuntu Linux 20.04. Kernel version Linux pie1 5.4.0-1078-raspi #89-Ubuntu SMP PREEMPT Mon Dec 5 08:38:35 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux

What went wrong?

Redpanda process immediately crashes during start. journalctl -u redpanda shows


an 09 22:08:47 pie9 rpk[15633]: System check 'Transparent huge pages active' failed. Required: true, Current
Jan 09 22:08:47 pie9 rpk[15633]: System check - PASSED
Jan 09 22:08:47 pie9 rpk[15633]: We'd love to hear about your experience with redpanda:
Jan 09 22:08:47 pie9 rpk[15633]: https://redpanda.com/feedback
Jan 09 22:08:47 pie9 rpk[15633]: Starting redpanda...
Jan 09 22:08:47 pie9 rpk[15633]: Running:
Jan 09 22:08:47 pie9 rpk[15633]: PWD=/ LOGNAME=redpanda HOME=/var/lib/redpanda LANG=en_GB.UTF-8 START_ARGS=--check=true>
Jan 09 22:08:47 pie9 systemd[1]: redpanda.service: Main process exited, code=killed, status=6/ABRT
Jan 09 22:08:47 pie9 systemd[1]: redpanda.service: Failed with result 'signal'.

What should have happened instead?

It should run without crash.

How to reproduce the issue?

follow installation steps for Debian and problem appears immediately on first start after apt install

Additional information

More info below

# coredumpctl -1 list

TIME                            PID   UID   GID SIG COREFILE  EXE
Mon 2023-01-09 22:24:03 +08   27227   110   114   6 present   /opt/redpanda/libexec/redpanda

# coredumpctl -1 debug redpanda
           PID: 27468 (redpanda)
           UID: 110 (redpanda)
           GID: 114 (redpanda)
        Signal: 6 (ABRT)
     Timestamp: Mon 2023-01-09 22:24:21 +08 (2s ago)
  Command Line: /opt/redpanda/bin/redpanda --redpanda-cfg /etc/redpanda/redpanda.yaml --unsafe-bypass-fsync=true --lock-memory=false --overprovisioned --reserve-memory=0M
    Executable: /opt/redpanda/libexec/redpanda
 Control Group: /redpanda.slice/redpanda.service
          Unit: redpanda.service
         Slice: redpanda.slice
       Boot ID: ef0c43641c08469eb20490f89437af2c
    Machine ID: 4036f15c2b294942a3ae33a6dbfb7210
      Hostname: pie9
       Storage: /var/lib/systemd/coredump/core.redpanda.110.ef0c43641c08469eb20490f89437af2c.27468.1673274261000000.zst
       Message: Process 27468 (redpanda) of user 110 dumped core.

                Stack trace of thread 27468:
                #0  0x0000007fb15c6068 n/a (libc.so.6 + 0x86068)
                #1  0x0000007fb157e880 raise (libc.so.6 + 0x3e880)
                #2  0x0000007fb156aef8 abort (libc.so.6 + 0x2aef8)
                #3  0x0000005583f36e94 n/a (redpanda + 0x4846e94)
                #4  0x0000007fb1895040 _ZNSt3__111__call_onceERVmPvPFvS2_E (libc++.so.1 + 0x55040)
                #5  0x0000005583f21a20 n/a (redpanda + 0x4831a20)
                #6  0x0000005583f2fa1c calloc (redpanda + 0x483fa1c)
                #7  0x0000007fb15817e8 __cxa_thread_atexit_impl (libc.so.6 + 0x417e8)
                #8  0x0000007fb180e6f0 __cxa_thread_atexit (libc++abi.so.1 + 0x2e6f0)
                #9  0x0000005583f22834 n/a (redpanda + 0x4832834)
                #10 0x0000005583f311f4 _Znwm (redpanda + 0x48411f4)
                #11 0x0000007fb1956c74 n/a (libboost_filesystem.so.1.75.0 + 0x6c74)
                #12 0x0000007fb27417c8 n/a (/opt/redpanda/lib/ld.so + 0x57c8)
                #13 0x0000007fb27418cc n/a (/opt/redpanda/lib/ld.so + 0x58cc)
                #14 0x0000007fb27541d8 n/a (/opt/redpanda/lib/ld.so + 0x181d8)
                #15 0x0000007fb27541d8 n/a (/opt/redpanda/lib/ld.so + 0x181d8)

...snip...

Reading symbols from /opt/redpanda/libexec/redpanda...
(No debugging symbols found in /opt/redpanda/libexec/redpanda)
[New LWP 27468]
Core was generated by `/opt/redpanda/bin/redpanda --redpanda-cfg /etc/redpanda/redpanda.yaml --unsafe-'.
Program terminated with signal SIGABRT, Aborted.
#0  0x0000007fb15c6068 in ?? () from /opt/redpanda/lib/libc.so.6

(gdb) thread apply all bt

Thread 1 (LWP 27468):
#0  0x0000007fb15c6068 in ?? () from /opt/redpanda/lib/libc.so.6
#1  0x0000007fb157e880 in raise () from /opt/redpanda/lib/libc.so.6
#2  0x0000007fb156aef8 in abort () from /opt/redpanda/lib/libc.so.6
#3  0x0000005583f36e94 in ?? ()
#4  0x0000007fb1895040 in std::__1::__call_once(unsigned long volatile&, void*, void (*)(void*)) () from /opt/redpanda/lib/libc++.so.1
#5  0x0000005583f21a20 in ?? ()
#6  0x0000005583f2fa1c in ?? ()
#7  0x0000007fb15817e8 in __cxa_thread_atexit_impl () from /opt/redpanda/lib/libc.so.6
#8  0x0000007fb180e6f0 in __cxa_thread_atexit () from /opt/redpanda/lib/libc++abi.so.1
#9  0x0000005583f22834 in ?? ()
#10 0x0000005583f311f4 in ?? ()
#11 0x0000007fb1956c74 in ?? () from /opt/redpanda/lib/libboost_filesystem.so.1.75.0
#12 0x0000007fb27417c8 in ?? () from /opt/redpanda/lib/ld.so
#13 0x0000007fb27418cc in ?? () from /opt/redpanda/lib/ld.so
#14 0x0000007fb27541d8 in ?? () from /opt/redpanda/lib/ld.so
Backtrace stopped: not enough registers or memory available to unwind further

Trying to start the process manually result in core dump too

$ /usr/bin/redpanda --redpanda-cfg /etc/redpanda/redpanda.yaml
Aborted (core dumped)

Currently switch OS on this board have other implications, so I hope this issue can be fixed in some other way.

JIRA Link: CORE-1134

sloppycoder commented 1 year ago

Download the kernel to Linux pie9 5.10.103-v8+ #1529 SMP PREEMPT Tue Mar 8 12:26:46 GMT 2022 aarch64 GNU/Linux same problem.

I tried it on an AWS Graivton instance on AWS, running Debian with 5.10 kernel, redpanda works fine there....

:(

dehuszar commented 1 year ago

I am also experiencing this. I have run the same version of the redpanda docker container on my amd64 laptop, and it runs fine. On my Raspberry Pi 4 with 4GB RAM running Debian Buster-based Raspberry Pi OS (kernel 5.10.92-v8+), it just returns Aborted after any command I attempt; but primarily and most importantly rpk redpanda start --overprovisioned. I don't see any messaging about core dumped in my output.

I also attempted to install the deb file directly onto the Pi's host OS using these instructions and see the same behavior; returning Aborted after every command.

It does not appear to be specific to the docker container.

sloppycoder commented 1 year ago

I think it's related to the kernel being used. On the same raspberry pi board, it doesn't work with

5.10.103-v8+ https://github.com/redpanda-data/redpanda/pull/1529 SMP PREEMPT (Raspberry Pi OS)

but works fine with

5.4.0-1078-raspi #89-Ubuntu SMP PREEMPT (Ubuntu 20.04 64 bit)

txdv commented 8 months ago

I am experiencing the same with:

Linux raspberrypi 6.6.20+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.20-1+rpt1 (2024-03-07) aarch64 GNU/Linux

Stracing reveals that mmap is failing(?)

prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
gettid()                                = 31589
gettid()                                = 31589
gettid()                                = 31589
gettid()                                = 31589
openat(AT_FDCWD, "/dev/urandom", O_RDONLY) = 3
read(3, "\202y\210\256", 4)             = 4
mmap(NULL, 35184372088832, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
gettid()                                = 31589
getpid()                                = 31589
tgkill(31589, 31589, SIGABRT)           = 0
--- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=31589, si_uid=1000} ---
+++ killed by SIGABRT (core dumped) +++
Aborted (core dumped)

Can someone confirm that they are seeing the same?

dotnwat commented 8 months ago

@txdv how are you starting redpanda, and what kind of resources are available (cores/memory) on the pi? when redpanda first starts it'll print out some information about the hardware it sees. that could be useful to help debug.

txdv commented 8 months ago

Its an 8GB rpi 4 https://www.raspberrypi.com/products/raspberry-pi-4-model-b/

Currently when I started it I see nothing:

$ /opt/redpanda/bin/redpanda redpanda --redpanda-cfg /etc/redpanda/redpanda.yaml --overprovisioned --unsafe-bypass-fsync=true --reserve-memory=0M --lock-memory=false --default-log-level=debug
Aborted (core dumped)

Via docker:

WARN[0000] /home/bentkus/Docker/redpanda/docker-compose.yml: `version` is obsolete 
[+] Running 2/0
 ✔ Container redpanda-0        Created                                                                                                                                                                                                                                                 0.0s 
 ✔ Container redpanda-console  Created                                                                                                                                                                                                                                                 0.0s 
Attaching to redpanda-0, redpanda-console
redpanda-0        | + '[' '' = true ']'
redpanda-0        | + exec /usr/bin/rpk redpanda start --verbose --kafka-addr internal://0.0.0.0:9092,external://0.0.0.0:19092 --advertise-kafka-addr internal://redpanda-0:9092,external://localhost:19092 --pandaproxy-addr internal://0.0.0.0:8082,external://0.0.0.0:18082 --advertise-pandaproxy-addr internal://redpanda-0:8082,external://localhost:18082 --schema-registry-addr internal://0.0.0.0:8081,external://0.0.0.0:18081 --rpc-addr redpanda-0:33145 --advertise-rpc-addr redpanda-0:33145 --mode dev-container --default-log-level=debug
redpanda-0        | WARNING: This is a setup for development purposes only; in this mode your clusters may run unrealistically fast and data can be corrupted any time your computer shuts down uncleanly.
redpanda-0        | 17:51:03.611  DEBUG  Looking for redpanda install directory
redpanda-0        | 17:51:03.615  DEBUG  Checking if path '/opt/redpanda/bin/rpk' exists
redpanda-0        | 17:51:03.615  DEBUG  Checking if path '/opt/redpanda/bin/redpanda' exists
redpanda-0        | 17:51:03.615  DEBUG  Redpanda is installed in '/opt/redpanda'
redpanda-0        | We'd love to hear about your experience with Redpanda:
redpanda-0        | https://redpanda.com/feedback
redpanda-0        | Starting redpanda...
redpanda-0        | Running:
redpanda-0        | /opt/redpanda/bin/redpanda redpanda --redpanda-cfg /etc/redpanda/redpanda.yaml --overprovisioned --unsafe-bypass-fsync=true --reserve-memory=0M --lock-memory=false --default-log-level=debug
redpanda-console  | {"level":"info","ts":"2024-03-26T17:51:04.076Z","msg":"started Redpanda Console","version":"v2.4.5","built_at":"1709758506"}
redpanda-console  | {"level":"info","ts":"2024-03-26T17:51:04.077Z","msg":"testing admin client connectivity","urls":["http://redpanda-0:9644"]}
redpanda-0 exited with code 133
redpanda-console  | {"level":"fatal","ts":"2024-03-26T17:51:07.219Z","msg":"failed to create Redpanda service","error":"failed to test admin client connectivity: Get \"http://redpanda-0:9644/v1/brokers\": dial tcp: lookup redpanda-0 on 127.0.0.11:53: no such host"}
redpanda-console exited with code 1

Do I need something to enable to let it print the information you want?

I am using the standard docker template on my rpi4:

bentkus@raspberrypi:~/Docker/redpanda $ cat docker-compose.yml 
version: "3.7"
name: redpanda-quickstart-one-broker
networks:
  redpanda_network:
    driver: bridge
volumes:
  redpanda-0: null
services:
  redpanda-0:
    command:
      - redpanda
      - start
      - --verbose
      - --kafka-addr internal://0.0.0.0:9092,external://0.0.0.0:19092
      # Address the broker advertises to clients that connect to the Kafka API.
      # Use the internal addresses to connect to the Redpanda brokers'
      # from inside the same Docker network.
      # Use the external addresses to connect to the Redpanda brokers'
      # from outside the Docker network.
      - --advertise-kafka-addr internal://redpanda-0:9092,external://localhost:19092
      - --pandaproxy-addr internal://0.0.0.0:8082,external://0.0.0.0:18082
      # Address the broker advertises to clients that connect to the HTTP Proxy.
      - --advertise-pandaproxy-addr internal://redpanda-0:8082,external://localhost:18082
      - --schema-registry-addr internal://0.0.0.0:8081,external://0.0.0.0:18081
      # Redpanda brokers use the RPC API to communicate with each other internally.
      - --rpc-addr redpanda-0:33145
      - --advertise-rpc-addr redpanda-0:33145
      # Mode dev-container uses well-known configuration properties for development in containers.
      - --mode dev-container
      # Tells Seastar (the framework Redpanda uses under the hood) to use 1 core on the system.
      - --default-log-level=debug
    image: docker.redpanda.com/redpandadata/redpanda:v23.3.9-arm64
    container_name: redpanda-0
    volumes:
      - redpanda-0:/var/lib/redpanda/data
    networks:
      - redpanda_network
    ports:
      - 18081:18081
      - 18082:18082
      - 19092:19092
      - 19644:9644
  console:
    container_name: redpanda-console
    image: docker.redpanda.com/redpandadata/console:v2.4.5
    networks:
      - redpanda_network
    entrypoint: /bin/sh
    command: -c 'echo "$$CONSOLE_CONFIG_FILE" > /tmp/config.yml; /app/console'
    environment:
      CONFIG_FILEPATH: /tmp/config.yml
      CONSOLE_CONFIG_FILE: |
        kafka:
          brokers: ["redpanda-0:9092"]
          schemaRegistry:
            enabled: true
            urls: ["http://redpanda-0:8081"]
        redpanda:
          adminApi:
            enabled: true
            urls: ["http://redpanda-0:9644"]
    ports:
      - 8080:8080
    depends_on:
      - redpanda-0
dotnwat commented 8 months ago

Did you have look at this https://github.com/redpanda-data/redpanda/issues/12144 it looks like it might be related.

txdv commented 8 months ago

I saw it, but that one is an assertion, mine fails on memory map

m-idriss commented 8 months ago

same issue for me with an 8GB rpi 4 https://www.raspberrypi.com/products/raspberry-pi-4-model-b/ last default image for pi 4 and standard docker template on my rpi4 image: docker.redpanda.com/redpandadata/redpanda:v23.3.10-arm64


Starting redpanda-console ... done
Attaching to redpanda-0, redpanda-console
redpanda-console | {"level":"info","ts":"2024-03-29T06:08:54.812Z","msg":"started Redpanda Console","version":"v2.4.5","built_at":"1709758506"}
redpanda-0    | + '[' '' = true ']'
redpanda-0    | + exec /usr/bin/rpk redpanda start --verbose --kafka-addr internal://0.0.0.0:9092,external://0.0.0.0:19092 --advertise-kafka-addr internal://redpanda-0:9092,external://localhost:19092 --pandaproxy-addr internal://0.0.0.0:8082,external://0.0.0.0:18082 --advertise-pandaproxy-addr internal://redpanda-0:8082,external://localhost:18082 --schema-registry-addr internal://0.0.0.0:8081,external://0.0.0.0:18081 --rpc-addr redpanda-0:33145 --advertise-rpc-addr redpanda-0:33145 --mode dev-container --default-log-level=info
redpanda-0    | WARNING: This is a setup for development purposes only; in this mode your clusters may run unrealistically fast and data can be corrupted any time your computer shuts down uncleanly.
redpanda-0    | 06:08:54.014  DEBUG  Looking for redpanda install directory
redpanda-0    | 06:08:54.015  DEBUG  Checking if path '/opt/redpanda/bin/rpk' exists
redpanda-0    | 06:08:54.015  DEBUG  Checking if path '/opt/redpanda/bin/redpanda' exists
redpanda-0    | 06:08:54.015  DEBUG  Redpanda is installed in '/opt/redpanda'
redpanda-console | {"level":"info","ts":"2024-03-29T06:08:54.812Z","msg":"testing admin client connectivity","urls":["http://redpanda-0:9644"]}
redpanda-0    | We'd love to hear about your experience with Redpanda:
redpanda-0    | https://redpanda.com/feedback
redpanda-0    | Starting redpanda...
redpanda-0    | Running:
redpanda-0    | /opt/redpanda/bin/redpanda redpanda --redpanda-cfg /etc/redpanda/redpanda.yaml --default-log-level=info --lock-memory=false --unsafe-bypass-fsync=true --reserve-memory=0M --overprovisioned
redpanda-0 exited with code 133
redpanda-console | {"level":"fatal","ts":"2024-03-29T06:08:58.170Z","msg":"failed to create Redpanda service","error":"failed to test admin client connectivity: Get \"http://redpanda-0:9644/v1/brokers\": dial tcp: lookup redpanda-0 on 127.0.0.11:53: no such host"}
redpanda-console exited with code 1`
github-actions[bot] commented 2 months ago

This issue hasn't seen activity in 3 months. If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in two weeks.