Open johnynek opened 11 years ago
So this should be done for all TupleSetters and TupleGetters, right?
Getters don't allocate. Converters have to because scala tuples are immutable. But setters are creating cascading tuples, and they don't need to allocate. So, I think it is just the setters we need to address.
Relevant thread on the mailing list: https://groups.google.com/forum/#!topic/cascading-user/vbloF-5RjKo
Look here:
https://github.com/twitter/scalding/blob/develop/src/main/scala/com/twitter/scalding/TupleBase.scala#L55
We should add a method to set into an existing Tuple and not necessarily reallocate.
The reason is cascading will not keep a reference to your tuple after you put it into the collector, so it is safe to reuse just one Tuple inside the operations, which should reduce GC pressure.
Calling TupleSetter.set(t : T, ctup: CTuple)
is probably what we want, which will reset ctup to contain exactly t, and then just reuse one instance of ctup.