Closed amoskong closed 5 years ago
The kernel can only be updated to 4.4.0-145-generic
by apt-get upgrade
.
I tried to upgrade kernel by:
check latest kernel version by:
# apt-get search linux-image
# apt-get install linux-image-4.15.0-47-generic
# uname -a
Linux ubuntu1604 4.15.0-47-generic #50~16.04.1-Ubuntu SMP Fri Mar 15 16:06:21 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
The kernel check problem still exists when I upgrade Ubuntu 16.04 kernel to 4.15.0-47
I found the problem:
root@ubuntu1604:/tmp/new# mkdir /var/tmp/mnt
root@ubuntu1604:/tmp/new# dd if=/dev/zero of=/var/tmp/kernel-check.img bs=1M count=128
root@ubuntu1604:/tmp/new# mkfs.xfs /var/tmp/kernel-check.img
root@ubuntu1604:/tmp/new# sudo mount /var/tmp/kernel-check.img /var/tmp/mnt -o loop
root@ubuntu1604:/tmp/new# sudo iotune --fs-check --evaluation-directory /var/tmp/mnt
Illegal instruction (core dumped)
root@ubuntu1604:/tmp/new# ls /var/crash/
_opt_scylladb_libreloc_ld.so.0.crash
root@ubuntu1604:/tmp/new# apport-unpack /var/crash/_opt_scylladb_libreloc_ld.so.0.crash /tmp/new
root@ubuntu1604:/tmp/new# ls -l /tmp/new
total 2176
-rw-r--r-- 1 root root 5 Apr 3 15:37 Architecture
-rw-r--r-- 1 root root 2154496 Apr 3 15:37 CoreDump
-rw-r--r-- 1 root root 24 Apr 3 15:37 Date
-rw-r--r-- 1 root root 12 Apr 3 15:37 DistroRelease
-rw-r--r-- 1 root root 28 Apr 3 15:37 ExecutablePath
-rw-r--r-- 1 root root 10 Apr 3 15:37 ExecutableTimestamp
-rw-r--r-- 1 root root 1 Apr 3 15:37 _LogindSession
-rw-r--r-- 1 root root 5 Apr 3 15:37 ProblemType
-rw-r--r-- 1 root root 102 Apr 3 15:37 ProcCmdline
-rw-r--r-- 1 root root 5 Apr 3 15:37 ProcCwd
-rw-r--r-- 1 root root 264 Apr 3 15:37 ProcEnviron
-rw-r--r-- 1 root root 14117 Apr 3 15:37 ProcMaps
-rw-r--r-- 1 root root 1288 Apr 3 15:37 ProcStatus
-rw-r--r-- 1 root root 1 Apr 3 15:37 Signal
-rw-r--r-- 1 root root 30 Apr 3 15:37 Uname
-rw-r--r-- 1 root root 4 Apr 3 15:37 UserGroups
I failed to get backtrace from the coredump, will update later.
scylla --version
will also cause a coredump:
root@ubuntu1604:/tmp/new# rm /var/crash/*
root@ubuntu1604:/tmp/new# scylla --version
Illegal instruction (core dumped)
root@ubuntu1604:/tmp/new# ls /var/crash
_opt_scylladb_libreloc_ld.so.0.crash
iotune coredump:
I had installed scylla-server-dbg package. Which debug packages should be installed for decoding the backtrace?
root@ubuntu1604:/tmp/new# gdb /opt/scylladb/libexec/iotune.bin CoreDump
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /opt/scylladb/libexec/iotune.bin...BFD: /usr/lib/debug/.build-id/0f/0c145706eb5335fb5e41dc0fab393550b0a222.debug: unable to initialize decompress status for section .debug_aranges
BFD: /usr/lib/debug/.build-id/0f/0c145706eb5335fb5e41dc0fab393550b0a222.debug: unable to initialize decompress status for section .debug_aranges
warning: File "/usr/lib/debug/.build-id/0f/0c145706eb5335fb5e41dc0fab393550b0a222.debug" has no build-id, file skipped
(no debugging symbols found)...done.
warning: core file may not match specified executable file.
[New LWP 5454]
Error while mapping shared library sections:
`/usr/bin/iotune': not in executable format: File format not recognized
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
Core was generated by `/usr/bin/iotune /opt/scylladb/bin/../libexec/iotune.bin --fs-check --evaluation'.
Program terminated with signal SIGILL, Illegal instruction.
#0 0x000000000050a5de in ?? ()
(gdb) bt
#0 0x000000000050a5de in ?? ()
#1 0x000000000050f12e in malloc ()
#2 0x00007fd1db8f2a8a in ?? () from /opt/scylladb/bin/../libreloc/libstdc++.so.6
#3 0x00007fd1dc0c2e0a in ?? ()
#4 0x000000000000000e in ?? ()
#5 0x0000000000000004 in ?? ()
#6 0x00007ffc07c19bb0 in ?? ()
#7 0x00007ffc07c19bd8 in ?? ()
#8 0x00007fd1dc0df180 in ?? ()
#9 0x00007fd1dc0c2f0a in ?? ()
#10 0x0000000000000000 in ?? ()
scylla --version coredump:
scylla-test@amos-ubuntu16:~/ubuntu16-crash$ gdb /opt/scylladb/libexec/scylla.bin CoreDump
GNU gdb (Ubuntu 8.2-0ubuntu1~16.04.1) 8.2
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /opt/scylladb/libexec/scylla.bin...Reading symbols from /usr/lib/debug/.build-id/68/f5c63a2e409bb94fd894e5fd9845baab65a243.debug...done.
done.
warning: core file may not match specified executable file.
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
Core was generated by `/usr/bin/scylla /opt/scylladb/bin/../libexec/scylla.bin --version'.
Program terminated with signal SIGILL, Illegal instruction.
#0 0x0000000004080013 in seastar::memory::allocate (size=37) at ../../src/core/memory.cc:1254
1254 ../../src/core/memory.cc: No such file or directory.
(gdb) bt
#0 0x0000000004080013 in seastar::memory::allocate (size=37) at ../../src/core/memory.cc:1254
#1 malloc (n=37) at ../../src/core/memory.cc:1504
#2 0x00007f509b2e2160 in set_binding_values.part () from /opt/scylladb/bin/../libreloc/libc.so.6
#3 0x00007f509b2e2405 in bindtextdomain () from /opt/scylladb/bin/../libreloc/libc.so.6
#4 0x00007f509acdc22b in ?? () from /opt/scylladb/bin/../libreloc/libgpg-error.so.0
#5 0x00007f509c9b8e0a in ?? ()
#6 0x000000000000002e in ?? ()
#7 0x0000000000000002 in ?? ()
#8 0x00007ffce4be2bb0 in ?? ()
#9 0x00007ffce4be2bc8 in ?? ()
#10 0x00007f509c9d5180 in ?? ()
#11 0x00007f509c9b8f0a in ?? ()
#12 0x0000000000000000 in ?? ()
(gdb)
/CC @glommer @avikivity
ubuntu16-crash-scylla.tar.gz ubuntu16-crash-iotune.tar.gz
$ ls ubuntu16-crash-iotune
Architecture DistroRelease ProblemType ProcEnviron Signal _LogindSession
CoreDump ExecutablePath ProcCmdline ProcMaps Uname
Date ExecutableTimestamp ProcCwd ProcStatus UserGroups
$ ls ubuntu16-crash-scylla
Architecture DistroRelease ProblemType ProcEnviron Signal _LogindSession
CoreDump ExecutablePath ProcCmdline ProcMaps Uname
Date ExecutableTimestamp ProcCwd ProcStatus UserGroups
@amoskong is this issue happened on 3.0 as well?
This issue doesn't exist in latest 3.0.
These kernels should be good enough for fsqual, so something else happened.
I was unable to reproduce (building seastar's iotune in dbuild and running it on an xfs filesystem). Can you provide access to a scylla.deb that fails?
The VM had been recovered for new testing. I will reproduce with a gce instance, and provide you the ip soon.
On Thu, Apr 4, 2019 at 8:09 PM Avi Kivity notifications@github.com wrote:
I was unable to reproduce (building seastar's iotune in dbuild and running it on an xfs filesystem). Can you provide access to a scylla.deb that fails?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scylladb/scylla/issues/4392#issuecomment-479871147, or mute the thread https://github.com/notifications/unsubscribe-auth/AAS5zIo8WjFn3hpY-TAyqumUlbqtHQDmks5vdeuDgaJpZM4cZcJ9 .
It's enough for me to get the .deb you used.
On 04/04/2019 15.15, Amos Kong wrote:
The VM had been recovered for new testing. I will reproduce with a gce instance, and provide you the ip soon.
On Thu, Apr 4, 2019 at 8:09 PM Avi Kivity notifications@github.com wrote:
I was unable to reproduce (building seastar's iotune in dbuild and running it on an xfs filesystem). Can you provide access to a scylla.deb that fails?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scylladb/scylla/issues/4392#issuecomment-479871147, or mute the thread
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scylladb/scylla/issues/4392#issuecomment-479872794, or mute the thread https://github.com/notifications/unsubscribe-auth/AA-Femf_nkroaeMidVcIg3VdwRQKhkCNks5vdezggaJpZM4cZcJ9.
I failed to reproduce this issue (coredump) with a new gce instance. But I can still (100%) reproduce scylla --version coredump & iotune coredump with Ubuntu 16.04 VM which is used by artifact-test.
I will provide you the vm (ip/password) by slack.
Not able to reproduce on my local Ubuntu 18.04 baremetal environment and Ubuntu 16.04 Docker instance. It likely only occurs on the VM which artifact test uses, not all environments.
root@3af6f67f5074:/etc/apt/sources.list.d# scylla_kernel_check
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
libreadline5
Suggested packages:
xfsdump acl attr quota
The following NEW packages will be installed:
libreadline5 xfsprogs
0 upgraded, 2 newly installed, 0 to remove and 6 not upgraded.
Need to get 696 kB of archives.
After this operation, 3744 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu xenial/main amd64 libreadline5 amd64 5.2+dfsg-3build1 [99.5 kB]
Get:2 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 xfsprogs amd64 4.3.0+nmu1ubuntu1.1 [597 kB]
Fetched 696 kB in 1s (364 kB/s)
debconf: delaying package configuration, since apt-utils is not installed
Selecting previously unselected package libreadline5:amd64.
(Reading database ... 13356 files and directories currently installed.)
Preparing to unpack .../libreadline5_5.2+dfsg-3build1_amd64.deb ...
Unpacking libreadline5:amd64 (5.2+dfsg-3build1) ...
Selecting previously unselected package xfsprogs.
Preparing to unpack .../xfsprogs_4.3.0+nmu1ubuntu1.1_amd64.deb ...
Unpacking xfsprogs (4.3.0+nmu1ubuntu1.1) ...
Processing triggers for libc-bin (2.23-0ubuntu11) ...
Setting up libreadline5:amd64 (5.2+dfsg-3build1) ...
Setting up xfsprogs (4.3.0+nmu1ubuntu1.1) ...
Processing triggers for libc-bin (2.23-0ubuntu11) ...
WARN 2019-04-05 08:27:42,856 [shard 0] iotune - Available space on filesystem at /var/tmp/mnt: 124 MB: is less than recommended: 10 GB
INFO 2019-04-05 08:27:42,856 [shard 0] iotune - /var/tmp/mnt passed sanity checks
This is a supported kernel version.
root@3af6f67f5074:/etc/apt/sources.list.d#
Not able to reproduce Ubuntu 16.04 VM on Virtualbox (via vagrant).
$ sudo scylla_kernel_check
WARN 2019-04-05 08:44:32,603 [shard 0] iotune - Available space on filesystem at /var/tmp/mnt: 124 MB: is less than recommended: 10 GB
INFO 2019-04-05 08:44:32,604 [shard 0] iotune - /var/tmp/mnt passed sanity checks
This is a supported kernel version.
@amoskong is there a way to launch artifacts VM locally, to reproduce problem? I can see the error on jenkins log, but I don't find a way to reproduce it.
Avi already reproduced the problem on artifact-test vm (ubuntu16), I don't know if he found the root problem.
I can prepare a single vm ubuntu16 for you on Monday.
Takuya ASADA notifications@github.com 于 2019年4月5日周五 下午6:52写道:
@amoskong https://github.com/amoskong is there a way to launch artifacts VM locally, to reproduce problem? I can see the error on jenkins log, but I don't find a way to reproduce it.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scylladb/scylla/issues/4392#issuecomment-480233197, or mute the thread https://github.com/notifications/unsubscribe-auth/AAS5zIC532KrE4ov9OCu4XjZ6z4M9XAeks5vdyr9gaJpZM4cZcJ9 .
The problem is that scylla gets compiled with new instructions (I saw a BMI2 instruction, but others are also present), so running on an older machine fails. The idea was to build with -march=westmere, but this was probably lost in the Great Cmake Translation.
@hakuch any ideas? ./configure.py --enable-dpdk was supposed to limit dpdk to westmere, but likely this got lost, and dpdk added flags to build for the host machine. Once we ran a test on an older machine, the problem showed up.
Here's a stanza from a seastar build.ninja:
build CMakeFiles/seastar.dir/src/core/prometheus.cc.o: CXX_COMPILER__seastar ../../src/core/prometheus.cc || cmake_object_order_depends_target_seastar
DEFINES = -DFMT_SHARED -DSEASTAR_HAS_MEMBARRIER -DSEASTAR_HAVE_ASAN_FIBER_SUPPORT -DSEASTAR_HAVE_DPDK -DSEASTAR_HAVE_GCC6_CONCEPTS -DSEASTAR_HAVE_HWLOC -DSEASTAR_HAVE_LZ4_COMPRESS_DEFAULT -DSEASTAR_HAVE_NUMA -DSEASTAR_TYPE_ERASE_MORE -DSEASTAR_USE_STD_OPTIONAL_VARIANT_STRINGVIEW
DEP_FILE = CMakeFiles/seastar.dir/src/core/prometheus.cc.o.d
FLAGS = -O1 -std=gnu++17 -U_FORTIFY_SOURCE -fvisibility=hidden -UNDEBUG -Wall -Werror -Wno-error=deprecated-declarations -gz -Wno-error -march=westmere -fconcepts -march=native
INCLUDES = -I../../include -Igen/include -I../../src -Igen/src -isystem _cooking/installed/include/dpdk
OBJECT_DIR = CMakeFiles/seastar.dir
OBJECT_FILE_DIR = CMakeFiles/seastar.dir/src/core
-march=native overrode -march=westmere, and made the executables fail if you are running on a machine older than your build machine.
@avikivity, I see the issue. Finddpdk.cmake
transitively applies -march=native
. I'll change it to -march=westmere
.
I opened https://github.com/scylladb/seastar/issues/630 and sent a patch to the Seastar mailing list.
Fixed by 704600f829ab34b3a290acc3c66798d0abc1fabe.
Installation details Scylla version (or git commit hash): 666.development-0.20190403.0dc0a6025-1 Cluster size: 1 OS (RHEL/CentOS/Ubuntu/AWS AMI): debian 9, ubuntu 16.04, ubuntu 18.04
ERROR:
Test job:
Kernel version:
4.9.0-3-amd64
Linux ubuntu1604 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Linux ubuntu18 4.15.0-20-generic #21-Ubuntu SMP Tue Apr 24 06:16:15 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
@syuu1228 @roydahan