uccross / skyhookdm-ceph-cls

Skyhook Data Management: Storage and management of tabular data in Ceph.
https://www.skyhookdm.com
GNU Lesser General Public License v2.1
13 stars 9 forks source link

Implement pushdown cols only and list type ops #55

Closed kingwind94 closed 4 years ago

kingwind94 commented 4 years ago
  1. Add a new flag pushdown-cols-only (default false) to the run-query.cc. When the flag is set, the query execution will only push down columns projection to storage level and leave predicates processing to the more_processing step in query.cc.
  2. Append columns directly when there is no predicate and row_nums is empty in processArrowCol().
  3. Extract indexing lookup from exec_query_op .
  4. JaggedArray list ops added, SOT_max_lt and SOT_min_gt.
  5. List type data reducer ops (TODO).
jlefevre commented 4 years ago

Ok, I think we do not need the select pushdown for now, only the project pushdown as you describe.

kingwind94 commented 4 years ago

Do we need to verify the cols are same length here before pushback, will it throw an error?

Just did the check, though can't test this function for now still.

kingwind94 commented 4 years ago

I think we also need to check there are no selection predicates.

I don't think we should do the selection predicate check. Because it will pick up those columns in selection predicates and push them back to the vector for the output table.

kingwind94 commented 4 years ago

This looks good, thanks for the fixes and some refactoring as well. Can we remove the commented out blocks? Then let's merge.

just did it

jlefevre commented 4 years ago

Adds option for vertical partitioning (columnar) to pushdown a projection of all cols referenced by a query, and removes the select pushdown, instead applies selects on the client side after full tuple reconstruction from the projected cols. This PR implements a few example list ops for now, which may change when we move to Arrow compute API. Also optimizes result table creation by projecting full cols for project queries when no selects are present, instead of using the Arrow chunked array builder. Fixes a bug with chunk indexes.