scylladb / gemini

Test data integrity by comparing against an Oracle running in parallel
Apache License 2.0
31 stars 18 forks source link

fix(prettyCQL): prettyCQL crashes on ouf-of-bound access #433

Closed CodeLieutenant closed 2 weeks ago

CodeLieutenant commented 1 month ago

When gemini tries to pretty print CQL, in some cases can fail with out-of-bounds access on array.

out = append(out, queryChunks[qID])
CodeLieutenant commented 1 month ago

CI Failure is now related to #431 and not to the crashing on out of bounds error on prettyCQL

{"L":"INFO","T":"2024-10-23T10:29:46.940Z","N":"work cycle.mutation_job","M":"ending mutation loop"}
panic: map cql pretty, unknown type &{map timeuuid varint false}

goroutine 369282 [running]:
github.com/scylladb/gemini/pkg/typedef.(*MapType).CQLPretty(0xc00012ea80, {0xbbe060, 0xc00f77ce20})
    /home/runner/work/gemini/gemini/pkg/typedef/types.go:145 +0x3d4
github.com/scylladb/gemini/pkg/typedef.prettyCQL({0xc011b66120, 0x83}, {0xc0104ee500, 0x10, 0xccce11?}, {0xc0001802a0, 0xe, 0x0?})
    /home/runner/work/gemini/gemini/pkg/typedef/typedef.go:271 +0x36d
github.com/scylladb/gemini/pkg/typedef.(*Stmt).PrettyCQL(0xc0104adc00)
    /home/runner/work/gemini/gemini/pkg/typedef/typedef.go:99 +0x9e
github.com/scylladb/gemini/pkg/jobs.mutation({0xdf3cc0, 0xc016ee3db0}, 0x4400000015f61c80?, 0x1000000002000000?, 0x8f1ba312981d476b?, {0xdf3ba8, 0xc0002a2cc0}, 0x1000000060ca4704?, 0xb510549d53368ef8?, 0xc000184120, ...)
    /home/runner/work/gemini/gemini/pkg/jobs/jobs.go:391 +0x3f5
github.com/scylladb/gemini/pkg/jobs.mutationJob({0xdf3cc0, 0xc016ee3db0}, 0xc000400000, _, {0xc00006e088, 0xc00006e098, {0x0, 0x0, 0x0}, 0x1, ...}, ...)
    /home/runner/work/gemini/gemini/pkg/jobs/jobs.go:182 +0x2aa
github.com/scylladb/gemini/pkg/jobs.List.Run.func2()
    /home/runner/work/gemini/gemini/pkg/jobs/jobs.go:136 +0x10f
golang.org/x/sync/errgroup.(*Group).Go.func1()
    /home/runner/go/pkg/mod/golang.org/x/sync@v0.8.0/errgroup/errgroup.go:78 +0x50
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 1
    /home/runner/go/pkg/mod/golang.org/x/sync@v0.8.0/errgroup/errgroup.go:75 +0x96
make: *** [Makefile:71: integration-test] Error 2
CodeLieutenant commented 3 weeks ago

Benchmark results

cpu: Intel(R) Core(TM) i9-14900HX
BenchmarkPrettyCQLOLD
BenchmarkPrettyCQLOLD/New
BenchmarkPrettyCQLOLD/New-32              754578          1851 ns/op        1048 B/op         13 allocs/op
BenchmarkPrettyCQLOLD/Old
BenchmarkPrettyCQLOLD/Old-32              298058          3926 ns/op        3648 B/op         38 allocs/op
PASS

Another benchmark with Go Iterators

BenchmarkPrettyCQLOLD
BenchmarkPrettyCQLOLD/New
BenchmarkPrettyCQLOLD/New-32             1000000          1348 ns/op         928 B/op          4 allocs/op
BenchmarkPrettyCQLOLD/Old
BenchmarkPrettyCQLOLD/Old-32              264987          5549 ns/op        3648 B/op         38 allocs/op
PASS

Really don't know why Iterators optimize the allocations and where they do, but looks like it does, allocations take around 60ns to preform, and this ~500ns gap can be explained by that

CodeLieutenant commented 2 weeks ago

I dont know why, no explanation, but these changes made gemini stable. Further investigation is needed as nothing much has been done to affect how gemini works

fruch commented 2 weeks ago

I dont know why, no explanation, but these changes made gemini stable. Further investigation is needed as nothing much has been done to affect how gemini works

makes sense to me, you switch a few cases from panicking to erroring, it can make a lot of difference i.e. the test you are running are not checking for existence of errors, do they ?