twitter / scalding

A Scala API for Cascading
http://twitter.com/scalding
Apache License 2.0
3.5k stars 706 forks source link

Are the numbers accurate for HLL size #1227

Open ittayd opened 9 years ago

ittayd commented 9 years ago

In https://github.com/twitter/scalding/blob/develop/scalding-core/src/main/scala/com/twitter/scalding/ReduceOperations.scala#L94, it says that 10% error takes 256 bytes. But m=(104/error)^2=(104/10)^2=10.4^2=108 bytes

raoweijian commented 9 years ago

https://github.com/twitter/scalding/blob/develop/scalding-core/src/main/scala/com/twitter/scalding/ReduceOperations.scala#L85 Notice the "Approximate" in line 85.