scylladb / scylla-bench

43 stars 36 forks source link

panic: runtime error: invalid memory address or nil pointer dereference #107

Open yarongilor opened 2 years ago

yarongilor commented 2 years ago

Installation details

Kernel Version: 5.15.0-1019-aws Scylla version (or git commit hash): 5.0.3-20220907.b9a61c8e9 with build-id 7be266d2954825cdf843c744de04a0443a8f156c Relocatable Package: http://downloads.scylladb.com/downloads/scylla/relocatable/scylladb-5.0/scylla-x86_64-package-5.0.3.0.20220907.b9a61c8e9.tar.gz Cluster size: 4 nodes (i3en.3xlarge)

Scylla Nodes used in this run:

OS / Image: ami-03fc0de751a0b3314 (aws: eu-north-1)

Test: longevity-large-partition-4days-test-rq Test id: e83f2364-eaf5-4e99-8c28-433f76c2a24e Test name: scylla-staging/Longevity_yaron/longevity-large-partition-4days-test-rq Test config file(s):

Issue description

>>>>>>>

  1. Started 3 read stress (ASC, DESC, ASC/DESC/None) that ran "ok" for 10 hours.
  2. Throughput was (relatively low): 5k
  3. after ~ 8.5 hours the stress stared getting quorum timeouts.
  4. after 10 hours one of the stress (ASC) failed for that and loader got panic as well. invalid memory address or nil pointer dereference

error from s-b log:

yarongilor@yarongilor:~/Downloads/logs/loader-set-e83f2364$ tail scylla-bench-l0-34379347-f65c-4327-808c-554732ff449c.log -n 40

10h32m19.7s    7718   77180      0 1.9s   1.5s   706ms  495ms  260ms  3.8ms  65ms   
panic: runtime error: invalid memory address or nil pointer dereference
10h32m20.8s    7616   76160      0 1.9s   1.6s   703ms  502ms  287ms  3.9ms  67ms   
[signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x5fb33c]

goroutine 1 [running]:
github.com/HdrHistogram/hdrhistogram-go.(*iterator).next(0xc000171448)
    /go/pkg/mod/github.com/!hdr!histogram/hdrhistogram-go@v1.1.2/hdr.go:670 +0x1c
github.com/HdrHistogram/hdrhistogram-go.(*rIterator).next(...)
    /go/pkg/mod/github.com/!hdr!histogram/hdrhistogram-go@v1.1.2/hdr.go:683
github.com/HdrHistogram/hdrhistogram-go.(*Histogram).Merge(0xf0000000e?, 0x4000000000a?)
    /go/pkg/mod/github.com/!hdr!histogram/hdrhistogram-go@v1.1.2/hdr.go:177 +0x8d
github.com/scylladb/scylla-bench/pkg/results.(*MergedResult).AddResult(0xc2abffef60, {0x0, 0x0, 0x0, 0x0, 0x0, {0x0, 0x0, 0x0}, 0x0, ...})
    /go/scylla-bench-0.1.11/pkg/results/merged_result.go:53 +0x1b0
github.com/scylladb/scylla-bench/pkg/results.(*TestResults).GetResultsFromThreadsAndMerge(0xc000691380)
    /go/scylla-bench-0.1.11/pkg/results/thread_results.go:60 +0x89
github.com/scylladb/scylla-bench/pkg/results.(*TestResults).GetTotalResults(0xc000691380)
    /go/scylla-bench-0.1.11/pkg/results/thread_results.go:82 +0xcc
main.main()
    /go/scylla-bench-0.1.11/main.go:596 +0x355d

Screenshot from 2022-09-28 13-19-26

<<<<<<<

Logs:

No logs captured during this run.

Jenkins job URL

roydahan commented 2 years ago

@vponomaryov do you see what's causing this issue?

vponomaryov commented 2 years ago

@vponomaryov do you see what's causing this issue?

General info: Such errors appear when we try to use not initialized go-object. It may be caused either by a race condition or due to an unhandled error.

I haven't worked on the investigation of it to see the cause.

roydahan commented 2 years ago

Seems like when working with the reverse-query feature, after few hours of run we hit this issue.

vponomaryov commented 2 years ago

@yarongilor I created PR with possible fix for it here: https://github.com/scylladb/scylla-bench/pull/109 It may fix this issue, not guaranteed. Need to test it.

Upd: Created docker image with it here: vponomarovatscylladb/hydra-loaders:scylla-bench-v0.1.12--fix-issue-107-candidate So, just update your configuration with it.

vponomaryov commented 1 year ago

Hit it once again using the same config file (changed for some extent since then) for the scylla-bench.

Installation details

Kernel Version: 5.15.0-1030-aws Scylla version (or git commit hash): 5.2.0~rc2-20230228.908a82bea064 with build-id 2d8e1ab089ec69c36323037d66b1a72accfae399

Cluster size: 4 nodes (is4gen.4xlarge)

Scylla Nodes used in this run:

OS / Image: ami-074d26a74b8f73dba (aws: eu-west-1)

Test: longevity-large-partition-4days-arm-test Test id: c3260702-5b50-4389-8303-7464c8d5e384 Test name: scylla-5.2/longevity/longevity-large-partition-4days-arm-test Test config file(s):

Details:

It had 3 loaders. Pre-load finished without errors. Then, the main read stress commands failed on 2 loaders from 3. One of the loader failures is the same as in this bugreport:

2023/03/02 22:13:31 Operation timed out for scylla_bench.test - received only 1 responses from 2 CL=QUORUM.
2023/03/02 22:13:31 Operation timed out for scylla_bench.test - received only 1 responses from 2 CL=QUORUM.
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x5fc7dc]

goroutine 1 [running]:
github.com/HdrHistogram/hdrhistogram-go.(*iterator).next(0xc000119038)
    /go/pkg/mod/github.com/!hdr!histogram/hdrhistogram-go@v1.1.2/hdr.go:670 +0x1c
github.com/HdrHistogram/hdrhistogram-go.(*rIterator).next(...)
    /go/pkg/mod/github.com/!hdr!histogram/hdrhistogram-go@v1.1.2/hdr.go:683
github.com/HdrHistogram/hdrhistogram-go.(*Histogram).Merge(0xf0000000e?, 0x4000000000a?)
    /go/pkg/mod/github.com/!hdr!histogram/hdrhistogram-go@v1.1.2/hdr.go:177 +0x8d
github.com/scylladb/scylla-bench/pkg/results.(*MergedResult).AddResult(0xc1e0defb60, {0x0, 0x0, 0x0, 0x0, 0x0, {0x0, 0x0, 0x0}, 0x0, ...})
    /go/scylla-bench-0.1.15/pkg/results/merged_result.go:53 +0x1b0
github.com/scylladb/scylla-bench/pkg/results.(*TestResults).GetResultsFromThreadsAndMerge(0xc000413b80)
    /go/scylla-bench-0.1.15/pkg/results/thread_results.go:60 +0x89
github.com/scylladb/scylla-bench/pkg/results.(*TestResults).GetTotalResults(0xc000413b80)
    /go/scylla-bench-0.1.15/pkg/results/thread_results.go:82 +0xcc
main.main()
    /go/scylla-bench-0.1.15/main.go:631 +0x39bd

It failed after 35 minutes of running.

Logs:

Jenkins job URL

vponomaryov commented 1 year ago

@roydahan @fgelcer @fruch JFYI: the proposed fix in the https://github.com/scylladb/scylla-bench/pull/109 haven't had any attention since October 2022.

fruch commented 1 year ago

@roydahan @fgelcer @fruch JFYI: the proposed fix in the https://github.com/scylladb/scylla-bench/pull/109 haven't had any attention since October 2022.

I assumed it was a side effect of running out of memory, but we can get it merged, regardless.