luben / zstd-jni

JNI binding for Zstd
Other
809 stars 165 forks source link

SIGSEGV in trainFromBufferDirect #253

Closed m-nagarajan closed 1 year ago

m-nagarajan commented 1 year ago

Added two tests in ZstdDict.scala as shown below. One of them passes, but the other crashes with SIGSEGV.

Test 1: Passes

"Zstd" should "successfully build a dictionary with 10 integers" in {
    val trainer = new ZstdDictTrainer(1024, 100 * 1024)
    for (i <- 0 until 10) {
      trainer.addSample(i.toString.getBytes);
    }

    val dict = trainer.trainSamples(false)
    assert(dict.length != 0)
  }

Test 2: Fails. Added the Error details below. It crashes if the values of i is 7,8 or 9.

"Zstd" should "successfully build a dictionary with 9 integers" in {
    val trainer = new ZstdDictTrainer(1024, 100 * 1024)
    for (i <- 0 until 9) {
      trainer.addSample(i.toString.getBytes);
    }

    val dict = trainer.trainSamples(false)
    assert(dict.length != 0)
  }

Test 2 failed with below:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000000104c867b8, pid=45597, tid=5123
#
# JRE version: OpenJDK Runtime Environment Microsoft-29150 (11.0.13+8) (build 11.0.13-experimental+8-202111300049-29150)
# Java VM: OpenJDK 64-Bit Server VM Microsoft-29150 (11.0.13-experimental+8-202111300049-29150, mixed mode, tiered, compressed oops, g1 gc, bsd-aarch64)
# Problematic frame:
# C  [libzstd-jni-1.5.5-111180709278127975837.dylib+0x7a7b8]  FASTCOVER_tryParameters+0x23c
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /Users/mnagaraj/code/zstd-jni/hs_err_pid45597.log
#
# If you would like to submit a bug report, please visit:
#   https://github.com/microsoft/openjdk/issues
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)
m-nagarajan commented 1 year ago

Attaching the complete error file here

luben commented 1 year ago

Thanks for reporting it. Looks like a bug in upstream - filed an issue for them.

m-nagarajan commented 1 year ago

Thanks @luben. I also understand that this is not a practical case, but my intention is to find out whether there is an arbitrary bug in the code that might fail for cases with big sample size as well.

These are the different tests I tried and whether it passed/failed. Why is (i <-0 until 10) not throwing Src size is incorrect when it just added 10 bytes but I configured 1024 bytes as the sample size in new ZstdDictTrainer(1024, 100 * 1024). Reference to this being discussed by you on a previous issue.

(i <- 0 until 1)  until (i <- 0 until 7) =>  Src size is incorrect
(i <- 0 until 8) and (i <- 0 until 9) => SIGSEGV
from (i <-0 until 10) and above => Passed     <-- why doesn't this throw "Src size is incorrect" error.
luben commented 1 year ago

I looked at the code, the 1024 is the max samples size - the sum of the samples lengths can be up to that size. My previous comment was wrong.

m-nagarajan commented 1 year ago

Hi @luben, what would be the reliable way to add an additional check to avoid calling trainSamples if the collected samples doesn't meet the minimum size requirements (min number of samples required by zstd + to avoid this crash). Would a check on number of samples or the size of the samples or both work reliably here?