marcboeker / go-duckdb

go-duckdb provides a database/sql driver for the DuckDB database engine.
MIT License
646 stars 97 forks source link

Add Data Chunks to the Appender #139

Closed maiadegraaf closed 8 months ago

maiadegraaf commented 8 months ago

This PR alters the appender to use duckdb::data_chunks instead of the DuckDB appender.

The function AppendRow() is used the same as before, except instead of row-wise insertions, the rows are inserted into data_chunks, allowing us to append whole chunks at a time. When a chunk is full, a new one is automatically allocated. When Flush() is called, all the data_chunks are appended and destroyed in one go.

Currently appender_test.go takes the same time as before (0.12s). However, I plan to tweak and optimize this function to increase its efficiency.

Finally, a big benefit of transitioning to the data_chunk framework, is that it allows for the easy addition of nested types which will be coming in a future PR.

Relevant issue: #135 CC: @taniabogatsch

marcboeker commented 8 months ago

@maiadegraaf Thanks for migrating the appender to data chunks.

killzoner commented 8 months ago

Hey @maiadegraaf , turns out this PR probably introduces a breaking change for nested types (which were kind of already supported)

Until v1.5.5, I was able to insert TEXT[] data by using something like [value1, value2] as a string, but now the insertion throws Type mismatch in Append DataChunk and the types required for appender.

I guess implicit conversion is not done anymore but would still work under less restrictions from appendRowArray

marcboeker commented 8 months ago

@killzoner Could you please post the code you are using to insert TEXT[]. Thanks!