twitter / scalding

A Scala API for Cascading
http://twitter.com/scalding
Apache License 2.0
3.5k stars 708 forks source link

Improve GC pressure dealing with cascading Tuples. #253

Open johnynek opened 11 years ago

johnynek commented 11 years ago

Look here:

https://github.com/twitter/scalding/blob/develop/src/main/scala/com/twitter/scalding/TupleBase.scala#L55

We should add a method to set into an existing Tuple and not necessarily reallocate.

The reason is cascading will not keep a reference to your tuple after you put it into the collector, so it is safe to reuse just one Tuple inside the operations, which should reduce GC pressure.

Calling TupleSetter.set(t : T, ctup: CTuple)

is probably what we want, which will reset ctup to contain exactly t, and then just reuse one instance of ctup.

azymnis commented 11 years ago

So this should be done for all TupleSetters and TupleGetters, right?

johnynek commented 11 years ago

Getters don't allocate. Converters have to because scala tuples are immutable. But setters are creating cascading tuples, and they don't need to allocate. So, I think it is just the setters we need to address.

johnynek commented 10 years ago

Relevant thread on the mailing list: https://groups.google.com/forum/#!topic/cascading-user/vbloF-5RjKo