matrixorigin / matrixone

Hyperconverged cloud-edge native database
https://docs.matrixorigin.cn/en
Apache License 2.0
1.78k stars 275 forks source link

[Subtask]: blockio BlockRead API/Impl is broken #14507

Open fengttt opened 8 months ago

fengttt commented 8 months ago

Parent Issue

14469

Detail of Subtask

blockio.BlockRead should just load vector from s3/disk into memory then return a batch without copying any data. Maybe need to add a delete/valid mask in the batch. The purpose of blockhead is to read data as efficiently as we can, and do not be too smart.

Describe implementation you've considered

No response

Additional information

No response

fengttt commented 8 months ago

In theory we should lazy load columns, but if this is especially hard for now, lets punt.

fengttt commented 8 months ago

Summary: there should be NO memory copy in CN scan operator. We may need to enhance batch with a bit mask of selected rows.