n-young / trustdb

0 stars 1 forks source link

Error in query evaluation #6

Closed desmondcheongzx closed 3 years ago

desmondcheongzx commented 3 years ago

Our query evaluator assumes that ResultSets are sorted by timestamp. However, this invariant does not hold.

The problem arises here, because we simply append vectors of records. Each vector is sorted by timestamp, but the combined vector is not.

This can be fixed simply by performing a ResultSet union instead of appending a vector. However, this results in having to perform the union operation multiple times, incurring a quadratic cost. We could consider writing a multi-union operator, that merges n vectors of records simultaneously, bringing us back to a linear time cost.

desmondcheongzx commented 3 years ago

Correction, we cannot go to a linear time cost. To get a multi-union, you can do no better than O(ns log(s)), where n is the length of each series and s is the number of series we have. To do this, you would maintain a priority queue of the next data point from each of the s series to append to the vector of records.