Open johnynek opened 10 years ago
+1
this would be awesome !!!
Will something like this work? Add two vals in SummingCache to record # of keys and # of misses during put. Then in MapsideReduce adding two Stat counters to keep counting these two vals. We can create a subcalss of MSR to use in sumByLocalKeys. Or this can be used to all msr operations, not only sumByLocalKeys.
I think putting it in mapside reduce is the right place. We may have to reimplement the caching though. We want to record a miss when you put something in and there is not already a key. I don't see how to do that with SummingCache now. The easiest approach might just be to use a standard java map and LRU (which is what we had before anyway).
Standard JMap and LRU sounds good to me. But in current SummingCache's put method, what if we always check cache.get(key).isDefined and do a counting instead of simply .getOrElse. Will that work?
That should work.
Actually, we could subclass SummingCache and add a method like putWithHits(m: Map[K, V]): (Int, Option[Map[K, V]]) that would tell us the number of hits we had when we put. We can increment a counter with that, and then also increment a counter for the total number of items seen.
On Wed, Feb 4, 2015 at 4:18 PM, Kevin Lin notifications@github.com wrote:
Standard JMap and LRU sounds good to me. But in current SummingCache's put method, what if we always check cache.get(key).isDefined and do a counting instead of simply .getOrElse. Will that work?
— Reply to this email directly or view it on GitHub https://github.com/twitter/scalding/issues/1069#issuecomment-72982551.
Oscar Boykin :: @posco :: http://twitter.com/posco
Start with a PR in algebird: https://github.com/twitter/algebird/pull/410
The algebird PR is merged. Will prepare something to use it in the MSR but wait for incorporating the new release of algebird to scalding.
This would help us diagnose how much it helps us, and when it might be disabled (saving work).