Multi-threaded scan - Githubissues

First findings

It seems that DuckdDB might be able to read multiple parquet-files in concurrently -- but not one file concurrently

Thoughts

In theory, we could do this by copy from with exactly the same number of threads & use each thread the location info of the sheetreader thread.
Would it be possible to partition excel sheet in 2048 / (number of threads) rows? + make the buffers that size? Probably tricky, because we would have to know the number of columns before (because buffer size / columns is the numbers of rows, which fit into one buffer)

TODO

A multi-threaded scan would be interesting, since our copy/scan function takes some time.

Have a look at:

https://github.com/duckdb/duckdb_delta/blob/main/src/functions/delta_scan.cpp

According to the README, it supports a multi-threaded scan. I suspect that this doesn't need any new implementation, since they are reading the parquet files.

[ ] Find out whether this is due to the parquet files
[ ] Find out whether DuckDB supports also a multi-threaded scan of Apache Arrow format
[ ] Have a look at how the multi-threaded scan is implemented
[ ] Find out whether we could copy concurrently -- this might not be possible, because sheetreader-core saves the data in a special way (per thread & some rows are split in multiple threads -- and there is only an implicit order)

polydbms / sheetreader-duckdb

Multi-threaded scan #47

First findings

Thoughts

TODO