twitter / scalding

A Scala API for Cascading
http://twitter.com/scalding
Apache License 2.0
3.48k stars 703 forks source link

Implement ForceToDisk for Beam runner #1967

Closed nownikhil closed 2 years ago

nownikhil commented 2 years ago

We are defining a TempSource which captures output path and coder. We then add a resolver from TempSource to corresponding BeamSource. Added unit test for forceToDiskExecution.

nownikhil commented 2 years ago

@johnynek Any suggestions on how to implement forceToDisk?

nownikhil commented 2 years ago

Yeah good idea. We have some Twitter internal code for that. Though this is something I wanna revisit. For example in your patch you introduced CascadingBackend which adds implicit methods on mode and handles args parsing. Now we can do it for every backend, or a common entry point which uses the right mode based on args provided.

Currently we expect users to extend a custom trait MyJob extends ExecutionApp with TwitterBeamExecution, though ideally users should just be able to provide mode as an arg at runtime (which will of course require multiple packages to be available). Not super important at this point though.