vmware-archive / quickstep

Quickstep Project
Apache License 2.0
27 stars 13 forks source link

Quickstep gen stats #225

Closed rogersjeffreyl closed 8 years ago

rogersjeffreyl commented 8 years ago

This code adds an operator for getting the number of rows in the relation. A new field is added to the CatalogRelation to store the tuple. This is currently invoked after the TextScanOperator. This will be used in the optimizer for cardinality estimates.

pateljm commented 8 years ago

Nice feature @rogersjeffreyl ! Thanks.

pateljm commented 8 years ago

Quick question: Looks like this runs the new operator (which is nice to have to run to gather stats). Would it be possible to not re-scan the data (which this seems to be doing), and to have the TextScanOperator return the tuples loaded, which can then be used to increment the current catalog stats.

We can simply make a note of this and do a separate PR later. This is a nice addition!

@jianqiao @hbdeshmukh Comments?

hbdeshmukh commented 8 years ago

@pateljm That's a good point. Alternative could be to introduce the new operator in between TextScan and SaveBlocks (which is the operator chain before this PR), so that stats on each block can be computed via pipelined blocks from TextScan to GetStats operator, which subsequently get streamed to SaveBlocks and saved to disk.

jianqiao commented 8 years ago

LGTM. Merging.