twitter / summingbird

Streaming MapReduce with Scalding and Storm
https://twitter.com/summingbird
Apache License 2.0
2.14k stars 267 forks source link

summingbird does not preserve equality on Producers #746

Open johnynek opened 6 years ago

johnynek commented 6 years ago

due to the use of anonymous functions inside producer, if you have (p1, f1) == (p2, f2) you don't get p1.map(f1) == p2.map(f2). This frustrates the ability to memoize caching, which can in the worst case cause recomputation.

We are running a lot of issues like these down, but the solution we have come up with is to never use anonymous functions inside the planners and instead always use thinks like:

case class GetValue[K, V]() extends Function1[(K, V), V] {
  def apply(kv: (K, V)) = kv._2
}