scylladb / gocql

Package gocql implements a fast and robust ScyllaDB client for the Go programming language.
https://docs.scylladb.com/stable/using-scylla/drivers/cql-drivers/scylla-go-driver.html
BSD 3-Clause "New" or "Revised" License
189 stars 59 forks source link

Fix `debounce` fails #344

Open illia-li opened 1 week ago

illia-li commented 1 week ago

Reason - sometimes this test panics:

panic: test timed out after 10m0s
    running tests:
        TestSimpleDebouncer (10m0s)
        TestSimpleDebouncer/Case_2 (10m0s)

goroutine 5 [running]:
testing.(*M).startAlarm.func1()
    /opt/hostedtoolcache/go/1.23.2/x64/src/testing/testing.go:2373 +0x[26](https://github.com/scylladb/gocql/actions/runs/11838031730/job/32986270755?pr=334#step:6:27)5
created by time.goFunc
    /opt/hostedtoolcache/go/1.23.2/x64/src/time/sleep.go:215 +0x45

goroutine 1 [chan receive]:
testing.(*T).Run(0xc0000b04e0, {0x60af2c, 0x13}, 0x614530)
    /opt/hostedtoolcache/go/1.23.2/x64/src/testing/testing.go:1751 +0x851
testing.runTests.func1(0xc0000b04e0)
    /opt/hostedtoolcache/go/1.23.2/x64/src/testing/testing.go:2168 +0x86
testing.tRunner(0xc0000b04e0, 0xc0000dbae0)
    /opt/hostedtoolcache/go/1.23.2/x64/src/testing/testing.go:1690 +0x2[27](https://github.com/scylladb/gocql/actions/runs/11838031730/job/32986270755?pr=334#step:6:28)
testing.runTests(0xc0000a8198, {0x749320, 0x1, 0x1}, {0x7fe94b02a5b8?, 0x40?, 0x752ea0?})
    /opt/hostedtoolcache/go/1.23.2/x64/src/testing/testing.go:2166 +0x8bf
testing.(*M).Run(0xc0000b60a0)
    /opt/hostedtoolcache/go/1.23.2/x64/src/testing/testing.go:2034 +0xf18
main.main()
    _testmain.go:45 +0x165

goroutine 21 [chan receive]:
testing.(*T).Run(0xc0000b0680, {0x607cad, 0x6}, 0xc0000aa7b0)
    /opt/hostedtoolcache/go/1.23.2/x64/src/testing/testing.go:1751 +0x851
github.com/gocql/gocql/debounce.TestSimpleDebouncer(0xc0000b0680)
    /home/runner/work/gocql/gocql/debounce/simple_debouncer_test.go:33 +0x4bb
testing.tRunner(0xc0000b0680, 0x6145[30](https://github.com/scylladb/gocql/actions/runs/11838031730/job/32986270755?pr=334#step:6:31))
    /opt/hostedtoolcache/go/1.23.2/x64/src/testing/testing.go:1690 +0x227
created by testing.(*T).Run in goroutine 1
    /opt/hostedtoolcache/go/1.23.2/x64/src/testing/testing.go:1743 +0x826

goroutine 23 [runnable]:
runtime.Gosched()
    /opt/hostedtoolcache/go/1.23.2/x64/src/runtime/proc.go:[35](https://github.com/scylladb/gocql/actions/runs/11838031730/job/32986270755?pr=334#step:6:36)3 +0x14
github.com/gocql/gocql/debounce.waitTillChannelIsEmpty(...)
    /home/runner/work/gocql/gocql/debounce/simple_debouncer_test.go:60
github.com/gocql/gocql/debounce.TestSimpleDebouncer.func3(0xc0000b09c0)
    /home/runner/work/gocql/gocql/debounce/simple_debouncer_test.go:48 +0x2bc
testing.tRunner(0xc0000b09c0, 0xc0000aa7b0)
    /opt/hostedtoolcache/go/1.23.2/x64/src/testing/testing.go:1690 +0x227
created by testing.(*T).Run in goroutine 21
    /opt/hostedtoolcache/go/1.23.2/x64/src/testing/testing.go:1743 +0x826
FAIL    github.com/gocql/gocql/debounce 600.013s
dkropachev commented 1 week ago

Have to run it enough time to see that it fixes the issue ?

It got me thinking that channel is a bad mechanism for given problem, due to the inability to wait on it for empty state. Let's find better solution.

illia-li commented 1 week ago

Have to run it enough time to see that it fixes the issue ?

It got me thinking that channel is a bad mechanism for given problem, due to the inability to wait on it for empty state. Let's find better solution.

This problem is not due to the use of channels, but due to the switching caused by runtime.Gosched. To replace the channels with something else, I need more context, what exactly needs to be tested.

dkropachev commented 6 days ago

Have to run it enough time to see that it fixes the issue ? It got me thinking that channel is a bad mechanism for given problem, due to the inability to wait on it for empty state. Let's find better solution.

This problem is not due to the use of channels, but due to the switching caused by runtime.Gosched. To replace the channels with something else, I need more context, what exactly needs to be tested.

How switching causes problems here?

illia-li commented 6 days ago

How switching causes problems here?

To understand this, need to study the work of runtime.Gosched very deeply and compare it with the configuration of the environment and everything that runs in it. This is a very big job. In my opinion, there is no need for it now. After examining the code, I did not find any other possible sources of the problem, except for runtime.Gosched. I replaced it and everything worked.

My guess is that for some reason the runtime.Gosched execution is switched to some kind of long-playing goroutine, which leads to a 10-minute awaiting and subsequent exit via panic.