google / zetasketch

A collection of libraries for single-pass, distributed, sublinear-space approximate aggregation and sketching algorithms. Currently: HyperLogLog++; more to come.
Apache License 2.0
151 stars 23 forks source link

HyperLogLogPlusPlus should be serializable #3

Open nielsbasjes opened 5 years ago

nielsbasjes commented 5 years ago

To allow inclusion of this in systems like Apache Beam and Apache Flink the HyperLogLogPlusPlus class should be serializable. I ran into this while trying to implement a UDF for Apache Flink, there serializability is mandatory.

Also when trying to put an instance of the HyperLogLogPlusPlus class through a simple Beam pipeline resulted in

Caused by: java.io.NotSerializableException: com.google.zetasketch.HyperLogLogPlusPlus
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
    at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
    at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
chaitanyapilaka commented 4 years ago

Hey! I am interested in this as well because I'm trying to use the the said class in Apache beam. Has there been any progress on it? If not, is there anyway I can be helpful in moving this forward?

zfraa commented 4 years ago

@chaitanyapilaka Did you check out https://beam.apache.org/releases/javadoc/2.16.0/org/apache/beam/sdk/extensions/zetasketch/HllCount.html? That provides an off-the-shelf Beam integration.