twitter / scalding

A Scala API for Cascading
http://twitter.com/scalding
Apache License 2.0
3.5k stars 706 forks source link

Test to demonstrate planning complexity. #1765

Closed erik-stripe closed 6 years ago

erik-stripe commented 6 years ago

In current versions of Scalding, we have observed that increasing graph size massively increases planning time. We believe this is due to Cascading code that is cubic (or worse) in the number of vertices.

This test currently passes for size=64 (with size=128 commented out) but still takes 18s to plan at size=64, versus <1s for size=32. We ran this test on the cascading3 branch and observed a basically linear behavior (e.g. size=128 ran in <3s).

CLAassistant commented 6 years ago

CLA assistant check
All committers have signed the CLA.

johnynek commented 6 years ago

cc @pankajroark @fwbrasil this is why we want to merge cascading 3. It looks like this will solve our planning issues.

cc @cwensel

johnynek commented 6 years ago

This is great!

Glad to have concrete evidence of a fix in the long standing long-planning issue.