rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.5k stars 908 forks source link

[FEA] Story - Supporting Approximate Count Distinct #10792

Closed ttnghia closed 2 months ago

ttnghia commented 2 years ago

This issue tracks the dependencies for supporting approximate count distinct using HyperLogLog algorithm.

jrhemstad commented 2 years ago

Given there was already sizable discussion in https://github.com/rapidsai/cudf/issues/10652, can we just use that issue? Or is there a different intent with this one?

ttnghia commented 2 years ago

That issue has been diluted. Here I'm going to add checked list item for the necessary PRs/features we are going to add so people can keep track, like this: https://github.com/rapidsai/cudf/issues/10186.

revans2 commented 2 years ago

This is still wanted

revans2 commented 2 years ago

This is still wanted

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

GregoryKimball commented 2 years ago

@etseidl FYI, our friends in Spark-RAPIDS are also interested in the HyperLogLog algorithm. 😄

GregoryKimball commented 10 months ago

Also see https://github.com/NVIDIA/cuCollections/pull/429

vyasr commented 6 months ago

Echoing Jake's question from two years ago: do we need this issue? Can we consolidate discussion in #10652? At this point it seems like we're just forced to post updates in two places. @ttnghia WDYT?

res-life commented 2 months ago

I'm working on this.

res-life commented 2 months ago

If no one has already started this, then let me pick this one.

vyasr commented 2 months ago

To avoid fragmentation, I'm going to close this as a dup of #10652 so that we can focus discussion there.