varchar-io / nebula

A distributed block-based data storage and compute engine
https://nebula.bz
Apache License 2.0
154 stars 18 forks source link

Histogram view enhancement by updating min/max from predicates/filters #79

Open shawncao opened 3 years ago

shawncao commented 3 years ago

This is an enhancement based on user's intention, not a priority.

Screen Shot 2021-01-04 at 10 45 04 PM

When there is no filter or filter has nothing to do with the column to run histogram on, we use min/max from metadata. But when user put a predicate like value>90 and run hist("value"), we should be able to update the min value as [90, max] and then run 10 buckets over it, this scenario is called zoom in histogram which helps user to keep zoom into smaller data range with different granularity.

@shuoshang1990 feel free to take a look, but you don't have to take it, it's a new feature essentially on top of current hist.

shawncao commented 3 years ago

Also note (another related potential issue): having 10 buckets may not always be correct, for example if (updated) range is less than 10 in delta for integer columns, we can't have 10 buckets, such as for range [5, 8], we only can have 4 buckets. So it seems this change will incur some changes in Histogram function itself.