scylladb / scylla-jmx

Scylla JMX proxy
GNU Affero General Public License v3.0
28 stars 52 forks source link

scylla-jmx corudump on db node during stress memory test #142

Closed aleksbykov closed 3 years ago

aleksbykov commented 3 years ago

Scylla version: 4.2.rc4-0.20200923.2893f6e43 with build-id 35882560db8cf11bd2758115c78cb9be2bd1e49b timestamp: 2020-09-23T16:13:41Z git@github.com:scylladb/scylla.git-sha: 2893f6e43b2862e4a3e761bad6f5d8d1b5b0c45f git@github.com:scylladb/scylla-jmx.git-sha: ade4edaa20349c84cacd70e8caec1764986ed96d git@github.com:scylladb/scylla-tools-java.git-sha: cd0987292d82d78c4702e8300ff0b7db07e67c7f During the job https://jenkins.scylladb.com/job/scylla-4.2/job/longevity/job/longevity-100gb-4h-test/65/ on node 9 memory stress nemesis started. This nemesis try to allocate the memory:

Try to allocate 120% available memory, the allocated memory will be swaped out
stress-ng --vm-bytes $(awk '/MemAvailable/{printf \"%d\\n\", $2 * 1.2;}' < /proc/meminfo)k --vm-keep -m 1 -t 100
Try to allocate 90% total memory, the allocated memory will be swaped out
stress-ng --vm-bytes $(awk '/MemTotal/{printf \"%d\\n\", $2 * 0.9;}' < /proc/meminfo)k --vm-keep -m 1 -t 100

1st command executed without errors:

Running command "stress-ng --vm-bytes $(awk '/MemAvailable/{printf "%d\n", $2 * 1.2;}' < /proc/meminfo)k --vm-keep -m 1 -t 100"...
< t:2020-09-27 09:52:31,956 f:base.py         l:187  c:RemoteCmdRunner      p:DEBUG > stress-ng: info:  [50145] dispatching hogs: 1 vm
< t:2020-09-27 09:54:16,033 f:base.py         l:187  c:RemoteCmdRunner      p:DEBUG > stress-ng: info:  [50145] successful run completed in 104.02s (1 min, 44.02 secs)
< t:2020-09-27 09:54:16,042 f:base.py         l:105  c:RemoteCmdRunner      p:DEBUG > STDERR: stress-ng: info:  [50145] dispatching hogs: 1 vm
< t:2020-09-27 09:54:16,042 f:base.py         l:105  c:RemoteCmdRunner      p:DEBUG > stress-ng: info:  [50145] successful run completed in 104.02s (1 min, 44.02 secs)
< t:2020-09-27 09:54:16,043 f:base.py         l:107  c:RemoteCmdRunner      p:DEBUG > Command "stress-ng --vm-bytes $(awk '/MemAvailable/{printf "%d\n", $2 * 1.2;}' < /proc/meminfo)k --vm-keep -m 1 -t 100" finished with status 0

2nd command execution triggered the scylla-jmx coredump:

< t:2020-09-27 09:54:16,043 f:remote_base.py  l:519  c:RemoteCmdRunner      p:DEBUG > Running command "stress-ng --vm-bytes $(awk '/MemTotal/{printf "%d\n", $2 * 0.9;}' < /proc/meminfo)k --vm-keep -m 1 -t 100"...
< t:2020-09-27 09:54:16,077 f:base.py         l:187  c:RemoteCmdRunner      p:DEBUG > stress-ng: info:  [50717] dispatching hogs: 1 vm
0.00665,      0,      0,       0,       0,       0,       0
< t:2020-09-27 09:56:05,278 f:base.py         l:187  c:RemoteCmdRunner      p:DEBUG > stress-ng: info:  [50717] successful run completed in 109.18s (1 min, 49.18 secs)
< t:2020-09-27 09:56:05,278 f:base.py         l:105  c:RemoteCmdRunner      p:DEBUG > STDERR: stress-ng: info:  [50717] dispatching hogs: 1 vm
< t:2020-09-27 09:56:05,278 f:base.py         l:105  c:RemoteCmdRunner      p:DEBUG > stress-ng: info:  [50717] successful run completed in 109.18s (1 min, 49.18 secs)
< t:2020-09-27 09:56:05,278 f:base.py         l:107  c:RemoteCmdRunner      p:DEBUG > Command "stress-ng --vm-bytes $(awk '/MemTotal/{printf "%d\n", $2 * 0.9;}' < /proc/meminfo)k --vm-keep -m 1 -t 100" finished with status 0

Jmx coredump:

2020-09-27 09:54:10.000: (CoreDumpEvent Severity.ERROR): node=Node longevity-100gb-4h-4-2-db-node-74fd669a-9 [13.48.67.179 | 10.0.0.94] (seed: False)
corefile_url=
https://storage.cloud.google.com/upload.scylladb.com/core.scylla-jmx.996.47d5309670374dda890c17744429da08.29100.1601200450000/core.scylla-jmx.996.47d5309670374dda890c17744429da08.29100.1601200450000000
backtrace=           PID: 29100 (scylla-jmx)
           UID: 996 (scylla)
           GID: 1001 (scylla)
        Signal: 6 (ABRT)
     Timestamp: Sun 2020-09-27 09:54:10 UTC (15s ago)
  Command Line: /opt/scylladb/jmx/symlinks/scylla-jmx -Xmx256m -XX:+UseSerialGC -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.host=localhost -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=7199 -Djava.rmi.server.hostname=localhost -Dcom.sun.management.jmxremote.rmi.port=7199 -Djavax.management.builder.initial=com.scylladb.jmx.utils.APIBuilder -jar /opt/scylladb/jmx/scylla-jmx-1.0.jar
    Executable: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.262.b10-0.el7_8.x86_64/jre/bin/java
 Control Group: /
       Boot ID: 47d5309670374dda890c17744429da08
    Machine ID: df877a200226bc47d06f26dae0736ec9
      Hostname: longevity-100gb-4h-4-2-db-node-74fd669a-9
      Coredump: /var/lib/systemd/coredump/core.scylla-jmx.996.47d5309670374dda890c17744429da08.29100.1601200450000000
       Message: Process 29100 (scylla-jmx) of user 996 dumped core.

                Stack trace of thread 29669:
                #0  0x00007f150a0e4387 raise (libc.so.6)
                #1  0x00007f150a0e5a78 abort (libc.so.6)
                #2  0x00007f150994f439 _ZN2os5abortEb (libjvm.so)
                #3  0x00007f1509b6b64a _ZN7VMError14report_and_dieEv (libjvm.so)
                #4  0x00007f1509b6bcb7 _ZL13crash_handleriP9siginfo_tPv (libjvm.so)
                #5  0x00007f150aab4630 __restore_rt (libpthread.so.0)
                #6  0x00007f1509a89e42 _ZNK15ContiguousSpace10block_sizeEPK8HeapWord (libjvm.so)
                #7  0x00007f1509364a93 _ZNK27BlockOffsetArrayContigSpace18block_start_unsafeEPKv (libjvm.so)
                #8  0x00007f150961172c _ZN27GenerationBlockStartClosure8do_spaceEP5Space (libjvm.so)
                #9  0x00007f150960f9fd _ZNK10Generation11block_startEPKv (libjvm.so)
                #10 0x00007f1509946c10 _ZN2os14print_locationEP12outputStreamlb (libjvm.so)
                #11 0x00007f150995a038 _ZN2os19print_register_infoEP12outputStreamPv (libjvm.so)
                #12 0x00007f1509b6a3a8 _ZN7VMError6reportEP12outputStream (libjvm.so)
                #13 0x00007f1509b6b18d _ZN7VMError14report_and_dieEv (libjvm.so)
                #14 0x00007f15099595e5 JVM_handle_linux_signal (libjvm.so)
                #15 0x00007f150994c5f8 _Z13signalHandleriP9siginfo_tPv (libjvm.so)
                #16 0x00007f150aab4630 __restore_rt (libpthread.so.0)
                #17 0x00007f15097f5153 _ZNK5Klass13external_nameEv (libjvm.so)
                #18 0x00007f15096b6556 _ZN27java_lang_StackTraceElement6createE6HandleiiiiP6Thread (libjvm.so)
                #19 0x00007f15096b6acc _ZN19java_lang_Throwable23get_stack_trace_elementEP7oopDesciP6Thread (libjvm.so)
                #20 0x00007f1509745fd5 JVM_GetStackTraceElement (libjvm.so)
                #21 0x00007f14f5084028 n/a (n/a)
                #22 0x00007f14f4bebd80 n/a (n/a)
                #23 0x00007f14f4bebd80 n/a (n/a)
                #24 0x00007f14f4bebffd n/a (n/a)
                #25 0x00007f14f4bebffd n/a (n/a)
                #26 0x00007f14f4bec042 n/a (n/a)
                #27 0x00007f14f4be44e7 n/a (n/a)
                #28 0x00007f15096b003e _ZN9JavaCalls11call_helperEP9JavaValueP12methodHandleP17JavaCallArgumentsP6Thread (libjvm.so)
                #29 0x00007f15096ad404 _ZN9JavaCalls12call_virtualEP9JavaValue11KlassHandleP6SymbolS4_P17JavaCallArgumentsP6Thread (libjvm.so)
                #30 0x00007f15096adaa5 _ZN9JavaCalls12call_virtualEP9JavaValue6Handle11KlassHandleP6SymbolS5_S2_P6Thread (libjvm.so)
                #31 0x00007f1509b11bd5 _ZN10JavaThread4exitEbNS_8ExitTypeE (libjvm.so)
                #32 0x00007f1509b127e1 _ZN10JavaThread17thread_main_innerEv (libjvm.so)
                #33 0x00007f150994e382 _ZL10java_startP6Thread (libjvm.so)
                #34 0x00007f150aaacea5 start_thread (libpthread.so.0)
                #35 0x00007f150a1ac8dd __clone (libc.so.6)

                Stack trace of thread 29119:
                #0  0x00007f150aab0de2 pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0)
                #1  0x00007f1509956dd2 _ZN2os13PlatformEvent4parkEl (libjvm.so)
                #2  0x00007f150990365f _ZN7Monitor5IWaitEP6Threadl (libjvm.so)
                #3  0x00007f15099041ee _ZN7Monitor4waitEblb (libjvm.so)
                #4  0x00007f15094c93d2 _ZN12CompileQueue3getEv (libjvm.so)
                #5  0x00007f15094ce61e _ZN13CompileBroker20compiler_thread_loopEv (libjvm.so)
                #6  0x00007f1509b12902 _ZN10JavaThread17thread_main_innerEv (libjvm.so)
                #7  0x00007f150994e382 _ZL10java_startP6Thread (libjvm.so)
                #8  0x00007f150aaacea5 start_thread (libpthread.so.0)
                #9  0x00007f150a1ac8dd __clone (libc.so.6)

                Stack trace of thread 29114:
                #0  0x00007f150aab0de2 pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0)
                #1  0x00007f1509956dd2 _ZN2os13PlatformEvent4parkEl (libjvm.so)
                #2  0x00007f150990365f _ZN7Monitor5IWaitEP6Threadl (libjvm.so)
                #3  0x00007f150990400f _ZN7Monitor4waitEblb (libjvm.so)
                #4  0x00007f1509b708d0 _ZN8VMThread4loopEv (libjvm.so)
                #5  0x00007f1509b70bc2 _ZN8VMThread3runEv (libjvm.so)
                #6  0x00007f150994e382 _ZL10java_startP6Thread (libjvm.so)
                #7  0x00007f150aaacea5 start_thread (libpthread.so.0)
                #8  0x00007f150a1ac8dd __clone (libc.so.6)

                Stack trace of thread 29118:
                #0  0x00007f150aab0de2 pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0)
                #1  0x00007f1509956dd2 _ZN2os13PlatformEvent4parkEl (libjvm.so)
                #2  0x00007f150990365f _ZN7Monitor5IWaitEP6Threadl (libjvm.so)
                #3  0x00007f15099041ee _ZN7Monitor4waitEblb (libjvm.so)
                #4  0x00007f15094c93d2 _ZN12CompileQueue3getEv (libjvm.so)
                #5  0x00007f15094ce61e _ZN13CompileBroker20compiler_thread_loopEv (libjvm.so)
                #6  0x00007f1509b12902 _ZN10JavaThrea

download_instructions=
gsutil cp gs://upload.scylladb.com/core.scylla-jmx.996.47d5309670374dda890c17744429da08.29100.1601200450000/core.scylla-jmx.996.47d5309670374dda890c17744429da08.29100.1601200450000000 .
gunzip /var/lib/systemd/coredump/core.scylla-jmx.996.47d5309670374dda890c17744429da08.29100.1601200450000000

Messages from log:

2020-09-27T09:09:28+00:00  longevity-100gb-4h-4-2-db-node-74fd669a-9 !INFO    | scylla-jmx: Starting the JMX server
2020-09-27T09:09:28+00:00  longevity-100gb-4h-4-2-db-node-74fd669a-9 !INFO    | scylla-jmx: JMX is enabled to receive remote connections on port: 7199
2020-09-27T09:54:09+00:00  longevity-100gb-4h-4-2-db-node-74fd669a-9 !INFO    | scylla-jmx: Exception in thread "RMI Scheduler(0)" java.lang.NullPointerException
2020-09-27T09:54:09+00:00  longevity-100gb-4h-4-2-db-node-74fd669a-9 !INFO    | scylla-jmx: #
2020-09-27T09:54:09+00:00  longevity-100gb-4h-4-2-db-node-74fd669a-9 !INFO    | scylla-jmx: # A fatal error has been detected by the Java Runtime Environment:
2020-09-27T09:54:09+00:00  longevity-100gb-4h-4-2-db-node-74fd669a-9 !INFO    | scylla-jmx: #
2020-09-27T09:54:09+00:00  longevity-100gb-4h-4-2-db-node-74fd669a-9 !INFO    | scylla-jmx: #  SIGSEGV (0xb) at pc=0x00007f15097f5153, pid=29100, tid=0x00007f14f2564700
2020-09-27T09:54:09+00:00  longevity-100gb-4h-4-2-db-node-74fd669a-9 !INFO    | scylla-jmx: #
2020-09-27T09:54:09+00:00  longevity-100gb-4h-4-2-db-node-74fd669a-9 !INFO    | scylla-jmx: # JRE version: OpenJDK Runtime Environment (8.0_262-b10) (build 1.8.0_262-b10)
2020-09-27T09:54:09+00:00  longevity-100gb-4h-4-2-db-node-74fd669a-9 !INFO    | scylla-jmx: # Java VM: OpenJDK 64-Bit Server VM (25.262-b10 mixed mode linux-amd64 compressed oops)
2020-09-27T09:54:09+00:00  longevity-100gb-4h-4-2-db-node-74fd669a-9 !INFO    | scylla-jmx: # Problematic frame:
2020-09-27T09:54:09+00:00  longevity-100gb-4h-4-2-db-node-74fd669a-9 !INFO    | scylla-jmx: # V  [libjvm.so+0x7e1153]  Klass::external_name() const+0x23
2020-09-27T09:54:09+00:00  longevity-100gb-4h-4-2-db-node-74fd669a-9 !INFO    | scylla-jmx: #
2020-09-27T09:54:09+00:00  longevity-100gb-4h-4-2-db-node-74fd669a-9 !INFO    | scylla-jmx: # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
2020-09-27T09:54:09+00:00  longevity-100gb-4h-4-2-db-node-74fd669a-9 !INFO    | scylla-jmx: #
2020-09-27T09:54:09+00:00  longevity-100gb-4h-4-2-db-node-74fd669a-9 !INFO    | scylla-jmx: # An error report file with more information is saved as:
2020-09-27T09:54:09+00:00  longevity-100gb-4h-4-2-db-node-74fd669a-9 !INFO    | scylla-jmx: # /tmp/hs_err_pid29100.log
2020-09-27T09:54:10+00:00  longevity-100gb-4h-4-2-db-node-74fd669a-9 !INFO    | scylla-jmx: #
2020-09-27T09:54:10+00:00  longevity-100gb-4h-4-2-db-node-74fd669a-9 !INFO    | scylla-jmx: # If you would like to submit a bug report, please visit:
2020-09-27T09:54:10+00:00  longevity-100gb-4h-4-2-db-node-74fd669a-9 !INFO    | scylla-jmx: #   http://bugreport.java.com/bugreport/crash.jsp
2020-09-27T09:54:10+00:00  longevity-100gb-4h-4-2-db-node-74fd669a-9 !INFO    | scylla-jmx: #
2020-09-27T09:54:15+00:00  longevity-100gb-4h-4-2-db-node-74fd669a-9 !NOTICE  | systemd: scylla-jmx.service: main process exited, code=killed, status=6/ABRT
2020-09-27T09:54:15+00:00  longevity-100gb-4h-4-2-db-node-74fd669a-9 !NOTICE  | systemd: Unit scylla-jmx.service entered failed state.
2020-09-27T09:54:15+00:00  longevity-100gb-4h-4-2-db-node-74fd669a-9 !WARNING | systemd: scylla-jmx.service failed.
2020-09-27T09:54:15+00:00  longevity-100gb-4h-4-2-db-node-74fd669a-9 !CRIT    | systemd-coredump: Process 29100 (scylla-jmx) of user 996 dumped core.#012#012Stack trace of thread 29669:#012#0  0x00007f150a0e4387 raise (libc.so.6)#012#1  0x00007f150a0e5a78 abort (libc.so.6)#012#2  0x00007f150994f439 _ZN2os5abortEb (libjvm.so)#012#3  0x00007f1509b6b64a _ZN7VMError14report_and_dieEv (libjvm.so)#012#4  0x00007f1509b6bcb7 _ZL13crash_handleriP9siginfo_tPv (libjvm.so)#012#5  0x00007f150aab4630 __restore_rt (libpthread.so.0)#012#6  0x00007f1509a89e42 _ZNK15ContiguousSpace10block_sizeEPK8HeapWord (libjvm.so)#012#7  0x00007f1509364a93 _ZNK27BlockOffsetArrayContigSpace18block_start_unsafeEPKv (libjvm.so)#012#8  0x00007f150961172c _ZN27GenerationBlockStartClosure8do_spaceEP5Space (libjvm.so)#012#9  0x00007f150960f9fd _ZNK10Generation11block_startEPKv (libjvm.so)#012#10 0x00007f1509946c10 _ZN2os14print_locationEP12outputStreamlb (libjvm.so)#012#11 0x00007f150995a038 _ZN2os19print_register_infoEP12outputStreamPv (libjvm.so)#012#12 0x00007f1509b6a3a8 _ZN7VMError6reportEP12outputStream (libjvm.so)#012#13 0x00007f1509b6b18d _ZN7VMError14report_and_dieEv (libjvm.so)#012#14 0x00007f15099595e5 JVM_handle_linux_signal (libjvm.so)#012#15 0x00007f150994c5f8 _Z13signalHandleriP9siginfo_tPv (libjvm.so)#012#16 0x00007f150aab4630 __restore_rt (libpthread.so.0)#012#17 0x00007f15097f5153 _ZNK5Klass13external_nameEv (libjvm.so)#012#18 0x00007f15096b6556 _ZN27java_lang_StackTraceElement6createE6HandleiiiiP6Thread (libjvm.so)#012#19 0x00007f15096b6acc _ZN19java_lang_Throwable23get_stack_trace_elementEP7oopDesciP6Thread (libjvm.so)#012#20 0x00007f1509745fd5 JVM_GetStackTraceElement (libjvm.so)#012#21 0x00007f14f5084028 n/a (n/a)#012#22 0x00007f14f4bebd80 n/a (n/a)#012#23 0x00007f14f4bebd80 n/a (n/a)#012#24 0x00007f14f4bebffd n/a (n/a)#012#25 0x00007f14f4bebffd n/a (n/a)#012#26 0x00007f14f4bec042 n/a (n/a)#012#27 0x00007f14f4be44e7 n/a (n/a)#012#28 0x00007f15096b003e _ZN9JavaCalls11call_helperEP9JavaValueP12methodHandleP17JavaCallArgumentsP6Thread (libjvm.so)#012#29 0x00007f15096ad404 _ZN9JavaCalls12call_virtualEP9JavaValue11KlassHandleP6SymbolS4_P17JavaCallArgumentsP6Thread (libjvm.so)#012#30 0x00007f15096adaa5 _ZN9JavaCalls12call_virtualEP9JavaValue6Handle11KlassHandleP6SymbolS5_S2_P6Thread (libjvm.so)#012#31 0x00007f1509b11bd5 _ZN10JavaThread4exitEbNS_8ExitTypeE (libjvm.so)#012#32 0x00007f1509b127e1 _ZN10JavaThread17thread_main_innerEv (libjvm.so)#012#33 0x00007f150994e382 _ZL10java_startP6Thread (libjvm.so)#012#34 0x00007f150aaacea5 start_thread (libpthread.so.0)#012#35 0x00007f150a1ac8dd __clone (libc.so.6)#012#012Stack trace of thread 29119:#012#0  0x00007f150aab0de2 pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0)#012#1  0x00007f1509956dd2 _ZN2os13PlatformEvent4parkEl (libjvm.so)#012#2  0x00007f150990365f _ZN7Monitor5IWaitEP6Threadl (libjvm.so)#012#3  0x00007f15099041ee _ZN7Monitor4waitEblb (libjvm.so)#012#4  0x00007f15094c93d2 _ZN12CompileQueue3getEv (libjvm.so)#012#5  0x00007f15094ce61e _ZN13CompileBroker20compiler_thread_loopEv (libjvm.so)#012#6  0x00007f1509b12902 _ZN10JavaThread17thread_main_innerEv (libjvm.so)#012#7  0x00007f150994e382 _ZL10java_startP6Thread (libjvm.so)#012#8  0x00007f150aaacea5 start_thread (libpthread.so.0)#012#9  0x00007f150a1ac8dd __clone (libc.so.6)#012#012Stack trace of thread 29114:#012#0  0x00007f150aab0de2 pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0)#012#1  0x00007f1509956dd2 _ZN2os13PlatformEvent4parkEl (libjvm.so)#012#2  0x00007f150990365f _ZN7Monitor5IWaitEP6Threadl (libjvm.so)#012#3  0x00007f150990400f _ZN7Monitor4waitEblb (libjvm.so)#012#4  0x00007f1509b708d0 _ZN8VMThread4loopEv (libjvm.so)#012#5  0x00007f1509b70bc2 _ZN8VMThread3runEv (libjvm.so)#012#6  0x00007f150994e382 _ZL10java_startP6Thread (libjvm.so)#012#7  0x00007f150aaacea5 start_thread (libpthread.so.0)#012#8  0x00007f150a1ac8dd __clone (libc.so.6)#012#012Stack trace of thread 29118:#012#0  0x00007f150aab0de2 pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0)#012#1  0x00007f1509956dd2 _ZN2os13PlatformEvent4parkEl (libjvm.so)#012#2  0x00007f150990365f _ZN7Monitor5IWaitEP6Threadl (libjvm.so)#012#3  0x00007f15099041ee _ZN7Monitor4waitEblb (libjvm.so)#012#4  0x00007f15094c93d2 _ZN12CompileQueue3getEv (libjvm.so)#012#5  0x00007f15094ce61e _ZN13CompileBroker20compiler_thread_loopEv (libjvm.so)#012#6  0x00007f1509b12902 _ZN10JavaThread17thread_main_innerEv (libjvm.so)#012#7  0x00007f150994e382 _ZL10java_startP6Thread (libjvm.so)#012#8  0x00007f150aaacea5 start_thread (libpthread.so.0)#012#9  0x00007f150a1ac8dd __clone (libc.so.6)#012#012Stack trace of thread 29132:#012#0  0x00007f150aab0de2 pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0)#012#1  0x00007f1509956dd2 _ZN2os13PlatformEvent4parkEl (libjvm.so)#012#2  0x00007f150995724c _ZN2os5sleepEP6Threadlb (libjvm.so)#012#3  0x00007f1509b0cf9f _ZN13WatcherThread3runEv (libjvm.so)#012#4  0x00007f150994e382 _ZL10java_startP6Thread (libjvm.so)#012#5  0x00007f150aaacea5 start_thread (libpthread.so.0)#012#6  0x00007f150a1ac8dd __clone (libc.so.6)#012#012Stack trace of thread 29117:#012#0  0x00007f150aab2b3b do_futex_wait.constprop.1 (libpthread.so.0)#012#1  0x00007f150aab2bcf __new_sem_wait_slow.constprop.0 (libpthread.so.0)#012#2  0x00007f150aab2c6b sem_wait@@GLIBC_2.2.5 (libpthread.so.0)#012#3  0x00007f150994c811 _ZL21check_pending_signalsb (libjvm.so)#012#4  0x00007f1509944b71 _ZL19signal_thread_entryP10JavaThreadP6Thread (libjvm.so)#012#5  0x00007f1509b12902 _ZN10JavaThread17thread_main_innerEv (libjvm.so)#012#6  0x00007f150994e382 _ZL10java_startP6Thread (libjvm.so)#012#7  0x00007f150aaacea5 start_thread (libpthread.so.0)#012#8  0x00007f150a1ac8dd __clone (libc.so.6)#012#012Stack trace of thread 29120:#012#0  0x00007f150aab0a35 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)#012#1  0x00007f1509956b2b _ZN2os13PlatformEvent4parkEv (libjvm.so)#012#2  0x00007f15099034c7 _ZN7Monitor5IWaitEP6Threadl (libjvm.so)#012#3  0x00007f150990400f _ZN7Monitor4waitEblb (libjvm.so)#012#4  0x00007f15099fbb48 _ZN13ServiceThread20service_thread_entryEP10JavaThreadP6Thread (libjvm.so)#012#5  0x00007f1509b12902 _ZN10JavaThread17thread_main_innerEv (libjvm.so)#012#6  0x00007f150994e382 _ZL10java_startP6Thread (libjvm.so)#012#7  0x00007f150aaacea5 start_thread (libpthread.so.0)#012#8  0x00007f150a1ac8dd __clone (libc.so.6)#012#012Stack trace of thread 29109:#012#0  0x00007f150aab0de2 pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0)#012#1  0x00007f1509956dd2 _ZN2os13PlatformEvent4parkEl (libjvm.so)#012#2  0x00007f15099570f2 _ZN2os5sleepEP6Threadlb (libjvm.so)#012#3  0x00007f150975833a JVM_Sleep (libjvm.so)#012#4  0x00007f14f4bfc427 n/a (n/a)#012#5  0x00007f14f4bebffd n/a (n/a)#012#6  0x00007f14f4be44e7 n/a (n/a)#012#7  0x00007f15096b003e _ZN9JavaCalls11call_helperEP9JavaValueP12methodHandleP17JavaCallArgumentsP6Thread (libjvm.so)#012#8  0x00007f15097120ea _ZL17jni_invoke_staticP7JNIEnv_P9JavaValueP8_jobject11JNICallTypeP10_jmethodIDP18JNI_ArgumentPusherP6Thread.isra.197 (libjvm.so)#012#9  0x00007f1509726116 jni_CallStaticVoidMethod (libjvm.so)#012#10 0x00007f150a683a39 JavaMain (libjli.so)#012#11 0x00007f150aaacea5 start_thread (libpthread.so.0)#012#12 0x00007f150a1ac8dd __clone (libc.so.6)#012#012Stack trace of thread 29100:#012#0  0x00007f150aaae017 pthread_join (libpthread.so.0)#012#1  0x00007f150a688565 ContinueInNewThread0 (libjli.so)#012#2  0x00007f150a684c12 ContinueInNewThread (libjli.so)#012#3  0x00007f150a685992 JLI_Launch (libjli.so)#012#4  0x000055b923fbe79c main (java)#012#5  0x00007f150a0d0555 __libc_start_main (libc.so.6)#012#6  0x000055b923fbe7c7 _start (java)#012#012Stack trace of thread 29115:#012#0  0x00007f150aab0a35 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)#012#1  0x00007f1509956b2b _ZN2os13PlatformEvent4parkEv (libjvm.so)#012#2  0x00007f1509933605 _ZN13ObjectMonitor4waitElbP6Thread (libjvm.so)#012#3  0x00007f1509ad463e _ZN18ObjectSynchronizer4waitE6HandlelP6Thread (libjvm.so)#012#4  0x00007f1509746260 JVM_MonitorWait (libjvm.so)#012#5  0x00007f14f51f7a68 n/a (n/a)#012#6  0x00007f14f4d70624 n/a (n/a)#012#7  0x00007f14f4b
2020-09-27T09:54:15+00:00  longevity-100gb-4h-4-2-db-node-74fd669a-9 !INFO    | systemd: scylla-jmx.service holdoff time over, scheduling restart.
2020-09-27T09:54:15+00:00  longevity-100gb-4h-4-2-db-node-74fd669a-9 !INFO    | scylla-jmx: Picked up JAVA_TOOL_OPTIONS:
2020-09-27T09:54:16+00:00  longevity-100gb-4h-4-2-db-node-74fd669a-9 !INFO    | scylla-jmx: Using config file: /etc/scylla/scylla.yaml
2020-09-27T09:54:18+00:00  longevity-100gb-4h-4-2-db-node-74fd669a-9 !INFO    | scylla-jmx: Connecting to http://127.0.0.1:10000
2020-09-27T09:54:18+00:00  longevity-100gb-4h-4-2-db-node-74fd669a-9 !INFO    | scylla-jmx: Starting the JMX server
2020-09-27T09:54:25+00:00  longevity-100gb-4h-4-2-db-node-74fd669a-9 !INFO    | scylla-jmx: JMX is enabled to receive remote connections on port: 7199
2020-09-27T09:54:27+00:00  longevity-100gb-4h-4-2-db-node-74fd669a-9 !NOTICE  | sudo:  centos : TTY=unknown ; PWD=/home/centos ; USER=root ; COMMAND=/bin/curl --request PUT --upload-file /var/lib/systemd/coredump/core.scylla-jmx.996.47d5309670374dda890c17744429da08.29100.1601200450000000 upload.scylladb.com/core.scylla-jmx.996.47d5309670374dda890c17744429da08.29100.1601200450000/core.scylla-jmx.996.47d5309670374dda890c17744429da08.29100.1601200450000000

db log:messages.zip

roydahan commented 3 years ago

This nemesis is quite new nemesis (introduced in 4.2) to help testing the swap and memory locking configuration. What the nemesis does is trying to consume all the free memory in the system.

slivne commented 3 years ago

We will not chace this - the coredump is a bug in the JVM which is old