Closed m-nagarajan closed 1 year ago
Attaching the complete error file here
Thanks for reporting it. Looks like a bug in upstream - filed an issue for them.
Thanks @luben. I also understand that this is not a practical case, but my intention is to find out whether there is an arbitrary bug in the code that might fail for cases with big sample size as well.
These are the different tests I tried and whether it passed/failed. Why is (i <-0 until 10)
not throwing Src size is incorrect
when it just added 10 bytes
but I configured 1024 bytes
as the sample size in new ZstdDictTrainer(1024, 100 * 1024)
. Reference to this being discussed by you on a previous issue.
(i <- 0 until 1) until (i <- 0 until 7) => Src size is incorrect
(i <- 0 until 8) and (i <- 0 until 9) => SIGSEGV
from (i <-0 until 10) and above => Passed <-- why doesn't this throw "Src size is incorrect" error.
I looked at the code, the 1024 is the max samples size - the sum of the samples lengths can be up to that size. My previous comment was wrong.
Hi @luben, what would be the reliable way to add an additional check to avoid calling trainSamples
if the collected samples doesn't meet the minimum size requirements (min number of samples required by zstd + to avoid this crash). Would a check on number of samples or the size of the samples or both work reliably here?
Added two tests in ZstdDict.scala as shown below. One of them passes, but the other crashes with SIGSEGV.
Test 1: Passes
Test 2: Fails. Added the Error details below. It crashes if the values of
i
is7,8 or 9
.Test 2 failed with below: