twitter / scalding

A Scala API for Cascading
http://twitter.com/scalding
Apache License 2.0
3.5k stars 707 forks source link

Add counters to see the cache miss rate in sumByLocalKeys #1069

Open johnynek opened 10 years ago

johnynek commented 10 years ago

This would help us diagnose how much it helps us, and when it might be disabled (saving work).

vidma commented 9 years ago

+1

this would be awesome !!!

reconditesea commented 9 years ago

Will something like this work? Add two vals in SummingCache to record # of keys and # of misses during put. Then in MapsideReduce adding two Stat counters to keep counting these two vals. We can create a subcalss of MSR to use in sumByLocalKeys. Or this can be used to all msr operations, not only sumByLocalKeys.

johnynek commented 9 years ago

I think putting it in mapside reduce is the right place. We may have to reimplement the caching though. We want to record a miss when you put something in and there is not already a key. I don't see how to do that with SummingCache now. The easiest approach might just be to use a standard java map and LRU (which is what we had before anyway).

reconditesea commented 9 years ago

Standard JMap and LRU sounds good to me. But in current SummingCache's put method, what if we always check cache.get(key).isDefined and do a counting instead of simply .getOrElse. Will that work?

johnynek commented 9 years ago

That should work.

Actually, we could subclass SummingCache and add a method like putWithHits(m: Map[K, V]): (Int, Option[Map[K, V]]) that would tell us the number of hits we had when we put. We can increment a counter with that, and then also increment a counter for the total number of items seen.

On Wed, Feb 4, 2015 at 4:18 PM, Kevin Lin notifications@github.com wrote:

Standard JMap and LRU sounds good to me. But in current SummingCache's put method, what if we always check cache.get(key).isDefined and do a counting instead of simply .getOrElse. Will that work?

— Reply to this email directly or view it on GitHub https://github.com/twitter/scalding/issues/1069#issuecomment-72982551.

Oscar Boykin :: @posco :: http://twitter.com/posco

reconditesea commented 9 years ago

Start with a PR in algebird: https://github.com/twitter/algebird/pull/410

reconditesea commented 9 years ago

The algebird PR is merged. Will prepare something to use it in the MSR but wait for incorporating the new release of algebird to scalding.