opencomputeproject / onie

Open Network Install Environment
https://opencomputeproject.github.io/onie
Other
601 stars 373 forks source link

Segmentation fault in new lvm2 commands (since 2016.11) #480

Closed david56 closed 7 years ago

david56 commented 7 years ago

The lvm2 tools are upgraded since v2016.11. Accton AOS team found some commands will be segmentation fault. Please see the test log in AS7716_32X:

ONIE:/ # pvcreate /dev/sda4
  Physical volume "/dev/sda4" successfully created.
ONIE:/ # vgcreate ACCTON /dev/sda4
  Volume group "ACCTON" successfully created
ONIE:/ # lvcreate -l 20 -n sysroot ACCTON
Segmentation fault
ONIE:/ # lvcreate -l 20 -n sysroot ACCTON
  Logical Volume "sysroot" already exists in volume group "ACCTON"
ONIE:/ # pvdisplay
  --- Physical volume ---
  PV Name               /dev/sda4
  VG Name               ACCTON
  PV Size               256.00 MiB / not usable 4.00 MiB
  Allocatable           yes
  PE Size               4.00 MiB
  Total PE              63
  Free PE               43
  Allocated PE          20
  PV UUID               7qLYZH-jYwT-eGxI-z3Ig-xwkT-zQdZ-uSYTeT

ONIE:/ # vgdisplay
  --- Volume group ---
  VG Name               ACCTON
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  2
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
Segmentation fault
ONIE:/ # lvdisplay
Segmentation fault
ONIE:/ #

We found a discussion thread on internet said:

http://unix.stackexchange.com/questions/294582/segmentation-fault-when-running-lvcreate

I seem to have solved it by changing libc to glibc instead of uclibc. I also enabled DM uevents in the kernel.

I've checked that CONFIG_DM_UEVENT in kernel config has been enabled. However, changing uclibc to glibc is a big topic will affect all platforms. I would like to know if it can solve the issue without changing libc.

cbrune commented 7 years ago

Interesting. Just typing lvdisplay causes the segfault:

ONIE:/ # lvdisplay 
Segmentation fault

dmesg mentions a problem in libdevmapper.so.

ONIE:/ # dmesg | tail
...
lvdisplay[942]: segfault at 7fff43763fd8 ip 00007f92b421ce44 sp 00007fff43763fe0 error 6 in libdevmapper.so.1.02[7f92b41e5000+48000]

strace lvdisplay shows this:

ONIE:/ # strace lvdisplay
...
stat("/dev/mapper/control", {st_mode=S_IFCHR|0660, st_rdev=makedev(10, 236), ...}) = 0
open("/dev/mapper/control", O_RDWR)     = 5
ioctl(5, DM_VERSION, 0x55efb7e7ffe0)    = 0
ioctl(5, DM_DEV_STATUS, 0x55efb7e7ffe0) = -1 ENXIO (No such device or address)
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x7fffd3315fd8} ---
+++ killed by SIGSEGV +++
Segmentation fault

Some debugging to do.

Switching to glibc is not really an option.

david56 commented 7 years ago

The previous version lvm2 (2.02.105) compiled with the latest ONIE source does not see the problem. It looks the problem is not relative to the toolchain in ONIE.

cbrune commented 7 years ago

I agree. It is unclear why moving ahead to 2_02_155 was necessary. I have a PR open to revert the default version back to 2_02_105, which works fine.

Looking at the upstream:
https://sourceware.org/lvm2/ ftp://sources.redhat.com/pub/lvm2/

Version 2_02_155 was never officially released, not sure why we were using it.

I think the thing to do is revert back to 2_02_105.

david56 commented 7 years ago

The change is OK to me. But it seems that armv8a.make needs to override lvm2 version to 2_02_155, right?

cbrune commented 7 years ago

That sounds reasonable. Good idea.