It seems that DuckdDB might be able to read multiple parquet-files in concurrently -- but not one file concurrently
Thoughts
In theory, we could do this by copy from with exactly the same number of threads & use each thread the location info of the sheetreader thread.
Would it be possible to partition excel sheet in 2048 / (number of threads) rows? + make the buffers that size? Probably tricky, because we would have to know the number of columns before (because buffer size / columns is the numbers of rows, which fit into one buffer)
TODO
A multi-threaded scan would be interesting, since our copy/scan function takes some time.
According to the README, it supports a multi-threaded scan. I suspect that this doesn't need any new implementation, since they are reading the parquet files.
[ ] Find out whether this is due to the parquet files
[ ] Find out whether DuckDB supports also a multi-threaded scan of Apache Arrow format
[ ] Have a look at how the multi-threaded scan is implemented
[ ] Find out whether we could copy concurrently -- this might not be possible, because sheetreader-core saves the data in a special way (per thread & some rows are split in multiple threads -- and there is only an implicit order)
First findings
Thoughts
TODO
A multi-threaded scan would be interesting, since our copy/scan function takes some time.
Have a look at:
https://github.com/duckdb/duckdb_delta/blob/main/src/functions/delta_scan.cpp
According to the README, it supports a multi-threaded scan. I suspect that this doesn't need any new implementation, since they are reading the parquet files.
sheetreader-core
saves the data in a special way (per thread & some rows are split in multiple threads -- and there is only an implicit order)