kelindar / column

High-performance, columnar, in-memory store with bitmap indexing in Go
MIT License
1.44k stars 59 forks source link

how do I efficiently query for unique values of a field #82

Open ami-m opened 1 year ago

ami-m commented 1 year ago

say I get a stream of data: {machineCode: "", lat: , lon: } And I want to display a count of such datums per machineCode.

Is there a way to efficiently get all the unique machine codes? or should I just keep track of them while inserting data?

kelindar commented 1 year ago

No built-in feature in column for this, but there's 2 ways I can think of to solve this problem:

  1. if you're okay with imprecise measurement, use HyperLogLog to store machine codes
  2. otherwise, a standard map/set is required

You can do both during insertion or a range query that iterates over all elements.

ami-m commented 1 year ago

thanks, I went with the second method, but that leaves me with having to do the range query when restoring state from a snapshot :-(