twitter / scalding

A Scala API for Cascading
http://twitter.com/scalding
Apache License 2.0
3.5k stars 706 forks source link

Remove a quadradic function in our cascading3 support #1779

Closed johnynek closed 6 years ago

johnynek commented 6 years ago

This function was appending to a list, which is an O(N) operation in a loop. In the end, we only want a set, so the order does not matter. So, I changed to prepending, which is O(1), and then calling toSet at the end.

@cchepelov I think you wrote some of this stuff originally. I'm trying to understand the rules we need to follow to make sure cascading3 does not throw at plan time.

I'm still seeing some exceptions in our generative testing in some cases. Where did you learn the rules? Trial and error?

cc @cwensel

cchepelov commented 6 years ago

Good catch!

I… clearly get some blame for this in https://github.com/twitter/scalding/blame/cascading3/scalding-core/src/main/scala/com/twitter/scalding/RichPipe.scala#L150 but I don't remember.

Trial & error for sure, and most certainly not tried at all with MAPREDUCE (is anyone still using that ? ;) )

piyushnarang commented 6 years ago

👍