varchar-io / nebula

A distributed block-based data storage and compute engine
https://nebula.bz
Apache License 2.0
154 stars 18 forks source link

Support multiple files per ingest spec #164

Closed ritvik-statsig closed 2 years ago

ritvik-statsig commented 2 years ago

We should allow ingest specs to have multiple files so they are read into a single block, and then having the logic for determining how to group the files into an ingest spec. So that we can make the block size optimal instead of having very small blocks when there are very small files.

ritvik-statsig commented 2 years ago

@shawncao - would it be better to specify the optimal block size in terms of number of rows or size in bytes?