raptor-ml / raptor

Transform your pythonic research to an artifact that engineers can deploy easily.
https://raptor.ml
Apache License 2.0
149 stars 11 forks source link

[FEEDBACK] Core: approximate aggregations #96

Open AlmogBaku opened 2 years ago

AlmogBaku commented 2 years ago

Background

Some aggregations require computing values against the raw data. Since Raptor is designed as "production first", saving the raw data on the state then calculating is expensive and requires an intensive calculation.

That includes:

  1. Distinct count
  2. Percentile

What do you propose to do?

Distinct count

Implement HyperLogLog - redis already support this out of the box

Percentile

We can use uddsketch or t-digest https://docs.timescale.com/timescaledb/latest/how-to-guides/hyperfunctions/percentile-approx/advanced-agg/#percentile-approximation-advanced-aggregation-methods https://github.com/influxdata/tdigest

What have you already tried?

No response

What else should we know?

No response

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.