segmentio / parquet-go

Go library to read/write Parquet files
https://pkg.go.dev/github.com/segmentio/parquet-go
Apache License 2.0
341 stars 104 forks source link

fix issue 299 #323

Closed achille-roussel closed 2 years ago

achille-roussel commented 2 years ago

Fixes #299 Fixes #309

This PR addresses the misuse of unsafe.Pointer resulting in referencing invalid memory locations which caused occasional crashes of the program during garbage collection.

Following suggestions in https://github.com/segmentio/parquet-go/issues/299, I made the following changes:

kevinburkesegment commented 2 years ago

unsafe.Pointer use

achille-roussel commented 2 years ago

For the record, the panic triggered by the nil check:

--- FAIL: TestGenericBuffer (0.02s)
    --- FAIL: TestGenericBuffer/addressBook (0.00s)
panic: offset of nil array [recovered]
    panic: offset of nil array

goroutine 710 [running]:
testing.tRunner.func1.2({0x10271cf80, 0x1027f70e0})
    /usr/local/go/src/testing/testing.go:1396 +0x1c8
testing.tRunner.func1()
    /usr/local/go/src/testing/testing.go:1399 +0x378
panic({0x10271cf80, 0x1027f70e0})
    /usr/local/go/src/runtime/panic.go:884 +0x204
github.com/segmentio/parquet-go/sparse.array.offset(...)
    /Users/achilleroussel/dev/src/github.com/segmentio/parquet-go/sparse/array.go:59
github.com/segmentio/parquet-go/sparse.Array.Offset(...)
    /Users/achilleroussel/dev/src/github.com/segmentio/parquet-go/sparse/array.go:14
github.com/segmentio/parquet-go.writeRowsFuncOfStruct.func2({0x14001300740, 0x4, 0x4}, {{0x0?, 0x2?, 0x140003b2000?}}, {0x0?, 0x60?, 0xb0?})
    /Users/achilleroussel/dev/src/github.com/segmentio/parquet-go/column_buffer_go18.go:326 +0x108
github.com/segmentio/parquet-go.writeRowsFuncOfSlice.func1({0x14001300740, 0x4, 0x4}, {{0x140017cd1a8?, 0x1400008f558?, 0x1025ba07c?}}, {0xc0?, 0x0, 0x0})
    /Users/achilleroussel/dev/src/github.com/segmentio/parquet-go/column_buffer_go18.go:264 +0x124
github.com/segmentio/parquet-go.writeRowsFuncOfStruct.func2({0x14001300740, 0x4, 0x4}, {{0x140017cd180?, 0x0?, 0x0?}}, {0x0?, 0x0?, 0x0?})
    /Users/achilleroussel/dev/src/github.com/segmentio/parquet-go/column_buffer_go18.go:326 +0xcc
github.com/segmentio/parquet-go.makeBufferFunc[...].func1({0x140017cd180?, 0x2?, 0x140001d6420})
    /Users/achilleroussel/dev/src/github.com/segmentio/parquet-go/buffer_go18.go:74 +0x74
github.com/segmentio/parquet-go.(*GenericBuffer[...]).Write(...)
    /Users/achilleroussel/dev/src/github.com/segmentio/parquet-go/buffer_go18.go:122
github.com/segmentio/parquet-go_test.testGenericBufferRows[...]({0x140017cd180, 0x2, 0x2})
    /Users/achilleroussel/dev/src/github.com/segmentio/parquet-go/buffer_go18_test.go:65 +0x88
github.com/segmentio/parquet-go_test.testGenericBuffer[...].func1.1()
    /Users/achilleroussel/dev/src/github.com/segmentio/parquet-go/buffer_go18_test.go:50 +0x44
reflect.Value.call({0x102727de0?, 0x140003331d0?, 0x14000a045a8?}, {0x102610454, 0x4}, {0x1400008fe98, 0x1, 0x2?})
    /usr/local/go/src/reflect/value.go:584 +0x688
reflect.Value.Call({0x102727de0?, 0x140003331d0?, 0x14000151500?}, {0x1400008fe98?, 0x51b?, 0x1022b42c0?})
    /usr/local/go/src/reflect/value.go:368 +0x90
github.com/segmentio/parquet-go/internal/quick.(*Config).Check(0x102bb9a20, {0x102727de0?, 0x140003331d0?})
    /Users/achilleroussel/dev/src/github.com/segmentio/parquet-go/internal/quick/quick.go:60 +0x2e8
github.com/segmentio/parquet-go_test.quickCheck(...)
    /Users/achilleroussel/dev/src/github.com/segmentio/parquet-go/parquet_test.go:474
github.com/segmentio/parquet-go_test.testGenericBuffer[...].func1()
    /Users/achilleroussel/dev/src/github.com/segmentio/parquet-go/buffer_go18_test.go:46 +0x80
testing.tRunner(0x140001fa1a0, 0x140002a9310)
    /usr/local/go/src/testing/testing.go:1446 +0x10c
created by testing.(*T).Run
    /usr/local/go/src/testing/testing.go:1493 +0x300
exit status 2
FAIL    github.com/segmentio/parquet-go 1.135s
achille-roussel commented 2 years ago

And special thanks to @vc42-2 and @joe-elliott for their contributions to help resolve this issue.

joe-elliott commented 2 years ago

We've been running this in our largest internal cluster and can confirm it fixes both of the linked issues.