twitter / scalding

A Scala API for Cascading
http://twitter.com/scalding
Apache License 2.0
3.5k stars 706 forks source link

Introducing ExecutionOptimizationRules for optimizing executing graph #1879

Closed dieu closed 5 years ago

dieu commented 5 years ago

Hey,

We have in code some optimizations on Executions, like special implementation for zip on WriteExecution, to do this kind of optimization in the more generic way we can use dagon and describe optimizations in the same way and one place.

dieu commented 5 years ago

@johnynek I have one problem:

"can hashCode, compare, and run a long sequence" in {
      val execution = Execution.sequence((1 to 100000).toList.map(Execution.from(_)))
      assert(execution.hashCode == execution.hashCode)

      assert(execution == execution)

      assert(execution.shouldSucceed() == (1 to 100000).toList)
    }

this test is falling with StackOverflowError on evaluating ExecutionOptimizationRules.toLiteral, do we have in mind something to prevent StackOverflowError on huge graph of executions?

johnynek commented 5 years ago

@dieu we don't yet in dagon... :( a solution is to use something like Trampoline (or cats Eval).

dieu commented 5 years ago

@johnynek I write up simple optimizations for Execution and tests cases to check those optimizations.

I'm also was able to optimize dag of executions to fix original problems with zip on writes, but unfortunately with keeping override zip method on WriteExecution, because I was able to push zip into flatMap/map, but ZipWrite optimization not applicable since after pushing, zip become part of internal lambda (what's make sense, because they depended on future computation).

johnynek commented 5 years ago

This is good, and I merged it, but I wonder if we should apply optimizations again after FlatMap and UniqueIdExecution if the config is set to true?

Seems like we would want to apply the optimizations on the results in those cases.

dieu commented 5 years ago

@johnynek I thinking about this, but then we need to bring optimizer in an implementation of Execution, if you don't see any problem with it, I can do it.

johnynek commented 5 years ago

@dieu let's try a PR and see what it looks like if you have time.