Write balanced spatial partitioning query in MyriaL

We have a case (the cosmo8 dataset) where a spatial dataset is extremely skewed (always hashes to a single partition), making queries very difficult. If we had a MyriaL query that applied a balanced spatial partitioning technique like kd-trees or octrees, we would be able to pre-partition the data to preserve locality while avoiding skew. (If necessary, this logic could later be implemented as a standalone operator.)

We should be able to implement this query as a do...while loop, with the termination condition expressed as point density or fraction of total points. We could initially implement the partitioning as repeated bisection of the space, cycling through each dimension as in kd-trees, but simply bisecting along the midpoint coordinates of each cell. Later we could turn this into a proper kd-tree, bisecting along the median coordinates in each dimension (possibly just using the median of a random sample of k points rather than trying to run an O(n) distributed median algorithm).

uwescience / myria

Write balanced spatial partitioning query in MyriaL #813