scylladb / scylla-bench

42 stars 34 forks source link

runtime: unexpected return pc for runtime.gopark called from 0x0 #121

Open vponomaryov opened 1 year ago

vponomaryov commented 1 year ago

Issue description

Describe your issue in detail and steps it took to produce it.

Impact

Stress process exists not providing a user with a clear enough picture of what went wrong.

How frequently does it reproduce?

It is first time.

Installation details

Kernel Version: 5.15.0-1030-aws Scylla version (or git commit hash): 5.2.0~rc2-20230228.908a82bea064 with build-id 2d8e1ab089ec69c36323037d66b1a72accfae399

Cluster size: 4 nodes (is4gen.4xlarge)

Scylla Nodes used in this run:

OS / Image: ami-074d26a74b8f73dba (aws: eu-west-1)

Test: longevity-large-partition-4days-arm-test Test id: c3260702-5b50-4389-8303-7464c8d5e384 Test name: scylla-5.2/longevity/longevity-large-partition-4days-arm-test Test config file(s):

Details:

It had 3 loaders. Pre-load finished without errors. Then, the main read stress commands failed on 2 loaders from 3. One of the loader failures is the same as in the another existing open bug (https://github.com/scylladb/scylla-bench/issues/107) and second one failed after 43m of running with following errror:

2023/03/02 22:14:37 Operation timed out for scylla_bench.test - received only 1 responses from 2 CL=QUORUM.
2023/03/02 22:14:38 Operation timed out for scylla_bench.test - received only 1 responses from 2 CL=QUORUM.
runtime: unexpected return pc for runtime.gopark called from 0x0
stack: frame={sp:0xc2d2581138, fp:0xc2d2581158} stack=[0xc2d257a000,0xc2d2582000)
0x000000c2d2581038:  0x0000000000000000  0x0000000000000000 
0x000000c2d2581048:  0x0000000000000000  0x0000000000000000 
0x000000c2d2581058:  0x0000000000000000  0x0000000000000000 
0x000000c2d2581068:  0x0000000000000000  0x0000000000000000 
0x000000c2d2581078:  0x0000000000000000  0x0000000000000000 
0x000000c2d2581088:  0x0000000000000000  0x0000000000000000 
0x000000c2d2581098:  0x0000000000000000  0x0000000000000000 
0x000000c2d25810a8:  0x0000000000000000  0x0000000000000000 
0x000000c2d25810b8:  0x0000000000000000  0x0000000000000000 
0x000000c2d25810c8:  0x0000000000000000  0x0000000000000000 
0x000000c2d25810d8:  0x0000000000000000  0x0000000000000000 
0x000000c2d25810e8:  0x0000000000000000  0x0000000000000000 
0x000000c2d25810f8:  0x0000000000000000  0x0000000000000000 
0x000000c2d2581108:  0x0000000000000000  0x0000000000000000 
0x000000c2d2581118:  0x0000000000000000  0x0000000000000000 
0x000000c2d2581128:  0x0000000000000000  0x0000000000000000 
0x000000c2d2581138: <0x0000000000000000  0x0000000000000000 
0x000000c2d2581148:  0x0000000000000000 !0x0000000000000000 
0x000000c2d2581158: >0x0000000000000000  0x0000000000000000 
0x000000c2d2581168:  0x0000000000000000  0x0000000000000000 
0x000000c2d2581178:  0x0000000000000000  0x0000000000000000 
0x000000c2d2581188:  0x0000000000000000  0x0000000000000000 
0x000000c2d2581198:  0x0000000000000000  0x0000000000000000 
0x000000c2d25811a8:  0x0000000000000000  0x0000000000000000 
0x000000c2d25811b8:  0x0000000000000000  0x0000000000000000 
0x000000c2d25811c8:  0x0000000000000000  0x0000000000000000 
0x000000c2d25811d8:  0x0000000000000000  0x0000000000000000 
0x000000c2d25811e8:  0x0000000000000000  0x0000000000000000 
0x000000c2d25811f8:  0x0000000000000000  0x0000000000000000 
0x000000c2d2581208:  0x0000000000000000  0x0000000000000000 
0x000000c2d2581218:  0x0000000000000000  0x0000000000000000 
0x000000c2d2581228:  0x0000000000000000  0x0000000000000000 
0x000000c2d2581238:  0x0000000000000000  0x0000000000000000 
0x000000c2d2581248:  0x0000000000000000  0x0000000000000000 
fatal error: unknown caller pc

runtime stack:
runtime.throw({0x755d50?, 0x970940?})
    /usr/local/go/src/runtime/panic.go:992 +0x71
runtime.gentraceback(0xec8?, 0x7fd543593978?, 0x0?, 0x0?, 0x0, 0x0, 0x7fffffff, 0x781068, 0x6cdaca?, 0x0)
    /usr/local/go/src/runtime/traceback.go:269 +0x1c2a
runtime.copystack(0xc0005bf380, 0x7fd543593c70?)
    /usr/local/go/src/runtime/stack.go:930 +0x2f5
runtime.shrinkstack(0xc0005bf380)
    /usr/local/go/src/runtime/stack.go:1212 +0x126
runtime.scanstack(0xc0005bf380, 0xc000023238)
    /usr/local/go/src/runtime/mgcmark.go:754 +0xc7
runtime.markroot.func1()
    /usr/local/go/src/runtime/mgcmark.go:241 +0xc5
runtime.markroot(0xc000023238, 0x1814, 0x1)
    /usr/local/go/src/runtime/mgcmark.go:214 +0x1a5
runtime.gcDrain(0xc000023238, 0x2)
    /usr/local/go/src/runtime/mgcmark.go:1047 +0x39f
runtime.gcBgMarkWorker.func2()
    /usr/local/go/src/runtime/mgc.go:1291 +0x154
runtime.systemstack()
    /usr/local/go/src/runtime/asm_amd64.s:469 +0x49

goroutine 612 [GC worker (idle)]:
runtime.systemstack_switch()
    /usr/local/go/src/runtime/asm_amd64.s:436 fp=0xc0003de758 sp=0xc0003de750 pc=0x4621e0
runtime.gcBgMarkWorker()
    /usr/local/go/src/runtime/mgc.go:1263 +0x1b1 fp=0xc0003de7e0 sp=0xc0003de758 pc=0x41c571
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc0003de7e8 sp=0xc0003de7e0 pc=0x464401
created by runtime.gcBgMarkStartWorkers
    /usr/local/go/src/runtime/mgc.go:1131 +0x25

goroutine 1 [chan receive]:
github.com/scylladb/scylla-bench/pkg/results.(*TestResults).GetResultsFromThreadsAndMerge(0xc00038b8c0)
    /go/scylla-bench-0.1.15/pkg/results/thread_results.go:45 +0x105
github.com/scylladb/scylla-bench/pkg/results.(*TestResults).GetTotalResults(0xc00038b8c0)
    /go/scylla-bench-0.1.15/pkg/results/thread_results.go:82 +0xcc
main.main()
    /go/scylla-bench-0.1.15/main.go:631 +0x39bd

goroutine 698 [select]:
github.com/gocql/gocql.(*writeCoalescer).writeFlusher(0xc0007047e0, 0xc00027a788?)
    /go/pkg/mod/github.com/scylladb/gocql@v1.7.3/conn.go:880 +0x138
created by github.com/gocql/gocql.newWriteCoalescer
    /go/pkg/mod/github.com/scylladb/gocql@v1.7.3/conn.go:785 +0x173

goroutine 326703 [semacquire]:
sync.runtime_SemacquireMutex(0xc2381ae000?, 0x28?, 0x685fd5?)
    /usr/local/go/src/runtime/sema.go:71 +0x25
sync.(*Mutex).lockSlow(0xc23deaaf20)
    /usr/local/go/src/sync/mutex.go:162 +0x165
sync.(*Mutex).Lock(...)
    /usr/local/go/src/sync/mutex.go:81
sync.(*Once).doSlow(0xc0809e01b0?, 0xc082526fa8?)
    /usr/local/go/src/sync/once.go:64 +0x53
sync.(*Once).Do(...)
    /usr/local/go/src/sync/once.go:59
github.com/gocql/gocql.(*nextIter).fetch(0xc23deaaf00?)
    /go/pkg/mod/github.com/scylladb/gocql@v1.7.3/session.go:1689 +0x4a
created by github.com/gocql/gocql.(*nextIter).fetchAsync.func1
    /go/pkg/mod/github.com/scylladb/gocql@v1.7.3/session.go:1684 +0x5a

goroutine 1018 [select]:
github.com/gocql/gocql.(*Conn).exec(0xc00052ac00, {0x7df300, 0xc000018090}, {0x7dd640, 0xc26a4bcb00}, {0x0, 0x0?})
    /go/pkg/mod/github.com/scylladb/gocql@v1.7.3/conn.go:972 +0x4d7
github.com/gocql/gocql.(*Conn).executeQuery(0xc00052ac00, {0x7df300, 0xc000018090}, 0x4b6e37?)
    /go/pkg/mod/github.com/scylladb/gocql@v1.7.3/conn.go:1200 +0x585
github.com/gocql/gocql.(*Query).execute(0xc2750b76c0?, {0x7df300?, 0xc000018090?}, 0xc0001d5ba0?)
    /go/pkg/mod/github.com/scylladb/gocql@v1.7.3/session.go:1107 +0x2c
github.com/gocql/gocql.(*queryExecutor).attemptQuery(0xc00031a7b0, {0x7df300, 0xc000018090}, {0x7e0be8, 0xc203646dc0}, 0xc00052ac00)
    /go/pkg/mod/github.com/scylladb/gocql@v1.7.3/query_executor.go:32 +0x8a
github.com/gocql/gocql.(*queryExecutor).do(0xc00031a7b0, {0x7df300, 0xc000018090}, {0x7e0be8?, 0xc203646dc0?}, 0xc1f7cdd730)
    /go/pkg/mod/github.com/scylladb/gocql@v1.7.3/query_executor.go:127 +0x1a5
github.com/gocql/gocql.(*queryExecutor).executeQuery(0xc00031a7b0, {0x7e0be8, 0xc203646dc0})
    /go/pkg/mod/github.com/scylladb/gocql@v1.7.3/query_executor.go:66 +0x10d
github.com/gocql/gocql.(*Session).executeQuery(0xc000378000, 0x7df5d8?)
    /go/pkg/mod/github.com/scylladb/gocql@v1.7.3/session.go:524 +0x8c
github.com/gocql/gocql.(*nextIter).fetch.func1()
    /go/pkg/mod/github.com/scylladb/gocql@v1.7.3/session.go:1695 +0x95
sync.(*Once).doSlow(0x0?, 0xc080b539d8?)
    /usr/local/go/src/sync/once.go:68 +0xc2
sync.(*Once).Do(...)
    /usr/local/go/src/sync/once.go:59
github.com/gocql/gocql.(*nextIter).fetch(0xc2057203c0?)
    /go/pkg/mod/github.com/scylladb/gocql@v1.7.3/session.go:1689 +0x4a
github.com/gocql/gocql.(*Iter).Scan(0xc202951170, {0xc2059e6240?, 0x3, 0x3})
    /go/pkg/mod/github.com/scylladb/gocql@v1.7.3/session.go:1571 +0x7f
main.DoReadsFromTable.func1(0xc0001600c0)
    /go/scylla-bench-0.1.15/modes.go:466 +0xd45
main.RunTest(0xc0001600c0, {0x7dfaa8, 0xc00031a090}, {0x7de8b0, 0xc00036a240}, 0xc0b4699e50)
    /go/scylla-bench-0.1.15/modes.go:132 +0x1ac
main.DoReadsFromTable({0x751d13?, 0x4b5245?}, 0x30?, 0xc0000456a0?, {0x7dfaa8?, 0xc00031a090?}, {0x7de8b0?, 0xc00036a240?})
    /go/scylla-bench-0.1.15/modes.go:432 +0xb5
main.DoReads(0x7fff41e2cdb4?, 0xc00036a240?, {0x7dfaa8?, 0xc00031a090?}, {0x7de8b0?, 0xc00036a240?})
    /go/scylla-bench-0.1.15/modes.go:392 +0x53
main.main.func4(0x0?, 0xc0000aefc0?, {0x7de8b0, 0xc00036a240})
    /go/scylla-bench-0.1.15/main.go:628 +0x105
main.RunConcurrently.func1(0xb)
    /go/scylla-bench-0.1.15/modes.go:81 +0x66
created by main.RunConcurrently
    /go/scylla-bench-0.1.15/modes.go:79 +0x19c

... thousands of lines 

To view full error, open the loader-2 logs in the loader-set-c3260702.tar.gz archive.

Logs:

Jenkins job URL

roydahan commented 1 year ago

@dkropachev any chance you'll have time to help with this?

vponomaryov commented 1 year ago

The issue was successfully workarounded by using more powerful instance type for loader here: https://jenkins.scylladb.com/job/scylla-5.2/job/longevity/job/longevity-large-partition-4days-arm-test/3

So, this one occurs only on close-to-overload loader nodes. Hence, this issue is low-priority.

dkropachev commented 1 year ago

It is general golang issue, not specific to scylla-bench. It looks like there is race condition between gargage collector stack strinking/growing and parking on the channel, which has high probability under high load, will try to figure out on golang repo. to mitigate this issue we can reduce channel using, in particular we can make histogram atomic, which is pretty easy and make all workloads submit stats to the one histogram.