oxidecomputer / omicron

Omicron: Oxide control plane
Mozilla Public License 2.0
241 stars 36 forks source link

cockroach: `panic: interface conversion: interface {} is ... , not *[]uintptr` #2043

Open davepacheco opened 1 year ago

davepacheco commented 1 year ago

While trying to reproduce the various CockroachDB/Go issues, I've been running the Omicron test suite in a loop, and with a bit of extra instrumentation. I got a new failure mode here:

failures:

---- db::queries::external_ip::tests::test_next_external_ip_allocation_and_exhaustion stdout ----
log file: "/dangerzone/omicron_tmp/omicron_nexus-557a25e4ac0d4b04-test_next_external_ip_allocation_and_exhaustion.24686.53.log"
note: configured to log to "/dangerzone/omicron_tmp/omicron_nexus-557a25e4ac0d4b04-test_next_external_ip_allocation_and_exhaustion.24686.53.log"
thread 'db::queries::external_ip::tests::test_next_external_ip_allocation_and_exhaustion' panicked at 'called `Result::unwrap()` on an `Err` value: InternalError { internal_message: "unexpected database error: server closed the connection unexpectedly\n\tThis probably means the server terminated abnormally\n\tbefore or while processing the request.\n" }', nexus/src/db/datastore/mod.rs:220:48
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
WARN: dropped CockroachInstance without cleaning it up first (there may still be a child process running and a temporary directory leaked)
WARN: temporary directory leaked: /dangerzone/omicron_tmp/.tmp1D3W8h

failures:
    db::queries::external_ip::tests::test_next_external_ip_allocation_and_exhaustion

test result: FAILED. 84 passed; 1 failed; 0 ignored; 0 measured; 50 filtered out; finished in 88.27s

error: test failed, to rerun pass `-p omicron-nexus --lib`

Usually when this happens, Cockroach bails out before opening its listening socket. This time, it bailed out some time later:

$ bunyan "/dangerzone/omicron_tmp/omicron_nexus-557a25e4ac0d4b04-test_next_external_ip_allocation_and_exhaustion.24686.53.log"
[2022-12-10T22:24:07.731960499Z]  INFO: test_next_external_ip_allocation_and_exhaustion/24686 on ivanova: cockroach temporary directory: /dangerzone/omicron_tmp/.tmp1D3W8h
[2022-12-10T22:24:07.732288237Z]  INFO: test_next_external_ip_allocation_and_exhaustion/24686 on ivanova: cockroach: copying from seed directory (/home/dap/omicron-cockroachdb/target/debug/build/nexus-test-utils-0291f77f02e9a7d1/out/crdb-base) to storage directory (/dangerzone/omicron_tmp/.tmp1D3W8h/data)
[2022-12-10T22:24:07.760692976Z]  INFO: test_next_external_ip_allocation_and_exhaustion/24686 on ivanova: cockroach command line: cockroach start-single-node --insecure --http-addr=:0 --store=path=/dangerzone/omicron_tmp/.tmp1D3W8h/data,ballast-size=0 --listen-addr 127.0.0.1:0 --listening-url-file /dangerzone/omicron_tmp/.tmp1D3W8h/listen-url
[2022-12-10T22:24:23.051528731Z]  INFO: test_next_external_ip_allocation_and_exhaustion/24686 on ivanova: cockroach pid: 25837
[2022-12-10T22:24:23.052423478Z]  INFO: test_next_external_ip_allocation_and_exhaustion/24686 on ivanova: cockroach listen URL: postgresql://root@127.0.0.1:51730/omicron?sslmode=disable
...

We have to log in the stderr log file to find the problem:

$ cat /dangerzone/omicron_tmp/.tmp1D3W8h/data/logs/cockroach-stderr.log 
I221210 22:24:21.514261 1 util/log/file_sync_buffer.go:238 ⋮ [config]   file created at: 2022/12/10 22:24:21
I221210 22:24:21.514277 1 util/log/file_sync_buffer.go:238 ⋮ [config]   running on machine: ‹ivanova›
I221210 22:24:21.514282 1 util/log/file_sync_buffer.go:238 ⋮ [config]   binary: CockroachDB OSS v22.1.9-dirty (x86_64-pc-solaris2.11, built 2022/10/26 21:17:46, go1.17.13)
I221210 22:24:21.514287 1 util/log/file_sync_buffer.go:238 ⋮ [config]   arguments: [‹/home/dap/omicron-cockroachdb/out/cockroachdb/bin/cockroach› ‹start-single-node› ‹--insecure› ‹--http-addr=:0› ‹--store=path=/dangerzone/omicron_tmp/.tmp1D3W8h/data,ballast-size=0› ‹--listen-addr› ‹127.0.0.1:0› ‹--listening-url-file› ‹/dangerzone/omicron_tmp/.tmp1D3W8h/listen-url›]
I221210 22:24:21.514302 1 util/log/file_sync_buffer.go:238 ⋮ [config]   log format (utf8=✓): crdb-v2
I221210 22:24:21.514305 1 util/log/file_sync_buffer.go:238 ⋮ [config]   line format: [IWEF]yymmdd hh:mm:ss.uuuuuu goid [chan@]file:line redactionmark \[tags\] [counter] msg
I221210 22:24:21.514051 1 util/log/flags.go:198  [-] 1  stderr capture started
panic: interface conversion: interface {} is Lj???d????f?OL_???f?OL???Ef?, not *[]uintptr

goroutine 974 [running]:
github.com/cockroachdb/cockroach/pkg/util/caller.(*CallResolver).Lookup(0xc00021e270, 0x6)
    /ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/util/caller/resolver.go:218 +0x638
github.com/cockroachdb/cockroach/pkg/util/caller.Lookup(...)
    /ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/util/caller/resolver.go:191
github.com/cockroachdb/cockroach/pkg/util/log.makeEntry({0x8621460, 0xc001120000}, 0x3, 0x0, 0x4)
    /ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/util/log/log_entry.go:205 +0x20b
github.com/cockroachdb/cockroach/pkg/util/log.makeUnstructuredEntry({0x8621460, 0xc001120000}, 0x3, 0x0, 0x3, 0x1, {0x60876cd, 0x19}, {0xc0030f9410, 0x1, ...})
    /ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/util/log/log_entry.go:232 +0x99
github.com/cockroachdb/cockroach/pkg/util/log.logfDepthInternal({0x8621460, 0xc001120000}, 0x2, 0x3, 0x0, 0x0, {0x60876cd, 0x19}, {0xc0030f9410, 0x1, ...})
    /ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/util/log/channels.go:98 +0x5e5
github.com/cockroachdb/cockroach/pkg/util/log.logfDepth(...)
    /ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/util/log/channels.go:39
github.com/cockroachdb/cockroach/pkg/util/log.Errorf(...)
    /ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/util/log/log_channels_generated.go:640
github.com/cockroachdb/cockroach/pkg/sql/distsql.(*ServerImpl).setupFlow(0xc00232b880, {0x8621460, 0xc00382a4c0}, 0xc002d9a600, 0xc0036ccc00, 0xc00236c180, {0x85a63f8, 0xc002e1e380}, {0x0, 0x0}, ...)
    /ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/sql/distsql/server.go:413 +0x1688
github.com/cockroachdb/cockroach/pkg/sql/distsql.(*ServerImpl).SetupLocalSyncFlow(0xc00232b880, {0x8621460, 0xc00382a4c0}, 0xc0036ccc00, 0xc00236c180, {0x85a63f8, 0xc002e1e380}, {0x0, 0x0}, {0xc000cb62e8, ...})
    /ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/sql/distsql/server.go:584 +0x107
github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).setupFlows(0xc000b9b2c0, {0x8621460, 0xc00382a4c0}, 0xc000cb62e8, 0x0, 0xc0036be030, 0xc002e1e380, {0xc000cb62e8, 0xc000cb5ba8, 0x1, ...}, ...)
    /ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:373 +0xa6a
github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).Run(0xc000b9b2c0, {0x8621460, 0xc00382a4c0}, 0xc001468000, 0xc00340a580, 0xc001800000, 0xc002e1e380, 0xc000cb62e8, 0x0)
    /ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:555 +0xabf
github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).PlanAndRun(0xc000b9b2c0, {0x8621460, 0xc00382a4c0}, 0xc000cb62e8, 0xc001468000, 0xc00340a580, {{0x86236b8, 0xc0010a8280}, 0x0}, 0xc002e1e380)
    /ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:1445 +0x205
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execWithDistSQLEngine(0xc000cb5900, {0x8621460, 0xc00382a4c0}, 0xc000cb5ff0, 0x3, {0x86b6930, 0xc0038190b0}, 0x0, 0xc0036dca18)
    /ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:1506 +0x6bb
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).dispatchToExecutionEngine(0xc000cb5900, {0x8621460, 0xc00382a4c0}, 0xc000cb5ff0, {0x86b6930, 0xc0038190b0})
    /ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:1163 +0xc10
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmtInOpenState(0xc000cb5900, {0x8621508, 0xc0036e2540}, {{0x864fbf8, 0xc0022844b0}, {0xc0033ec0e1, 0xdc}, 0x1, 0x1}, 0xc0036d07e0, ...)
    /ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:699 +0x24b7
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmt(0xc000cb5900, {0x8621508, 0xc0036e2540}, {{0x864fbf8, 0xc0022844b0}, {0xc0033ec0e1, 0xdc}, 0x1, 0x1}, 0xc0036d07e0, ...)
    /ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:144 +0x645
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execPortal(0xc000cb5900, {0x8621508, 0xc0036e2540}, {0xc0036d07e0, {0xc0036c1060, 0x1, 0x1}, {0x0, 0x0, 0x0}, ...}, ...)
    /ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:232 +0x15a
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execCmd.func2({{0x0, 0x0}, 0x0, {0xe591dad, 0xedb26fe17, 0x0}, 0x1}, 0xc000cb5900, 0xc0030fbb18, 0xc0030fbb08, ...)
    /ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1969 +0x62a
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execCmd(0xc000cb5900)
    /ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1971 +0x526
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run(0xc000cb5900, {0x8621508, 0xc0036e2180}, 0xc002388900, {0x0, 0x0, 0x0, 0x0, 0x0}, 0x0)
    /ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1817 +0x277
github.com/cockroachdb/cockroach/pkg/sql.(*InternalExecutor).initConnEx.func1(0xc000cb5900, {0x8621508, 0xc0036e2180}, 0xc0014602a0, 0xc001dc0760, {0x85a6560, 0xc00366ae00}, 0x0, 0xc00368ab60)
    /ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/sql/internal.go:211 +0x89
created by github.com/cockroachdb/cockroach/pkg/sql.(*InternalExecutor).initConnEx
    /ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/sql/internal.go:210 +0x5db

It's interesting that this one does not involve the Go runtime directly.

This is with:

$ cockroach version
Build Tag:        v22.1.9-dirty
Build Time:       2022/10/21 16:56:49
Distribution:     OSS
Platform:         illumos amd64 (x86_64-pc-solaris2.11)
Go Version:       go1.17.13
C Compiler:       gcc 10.3.0
Build Commit ID:  e438c2f89282e607e0e6ca1d38b2e0a622f94493
Build Type:       release

and the test suite run is from 1edc4ab94290e7c1d0596033e69010095be828b2, which is a tiny branch from "main" at commit 6fa27a335a58181262e85824922bdf0dc68cfc9b.

I will put some data into catacomb:/data/staff/core/issues/omicron-$this_ticket.

davepacheco commented 1 year ago

I have the whole test temporary directory, which includes that bunyan log file, CockroachDB stdout/stderr, and the CockroachDB data directory (which includes its logs). I don't have a core file, unfortunately.

dap@catacomb /data/staff/core/issues/omicron-2043 $ ls -l
total 1650
-rw-rw-r--+  1 dap      staff     824987 Dec 11 02:31 omicron-2043-tmpdir.tgz
-rw-rw-r--+  1 dap      staff      29565 Dec 11 02:30 run-20013.out
dap@catacomb /data/staff/core/issues/omicron-2043 $