vmware-archive / database-stream-processor

Streaming and Incremental Computation Framework
Other
226 stars 20 forks source link

Count operator #125

Open Kixiron opened 2 years ago

Kixiron commented 2 years ago

We need to add a .count() operator that counts the number of values for any given key, e.g. (K, V).count() -> (K, isize)

ryzhyk commented 2 years ago

There are two forms of count we could support: count unique values V (count_distinct in DDlog) or count the sum of all weights of values associated with a key. The latter is linear, but the former seems to be what people more often want in practice. Both can be implemented using aggregate (a specialized implementation may be slightly more efficient, but I'm not sure it's worth it), but I agree that it needs to be packaged as a library method under src/operator.