Closed navinvishy closed 2 years ago
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
looks like we had test coverage for this, so you need to fix a test.
The Beam runner for Scalding does not work with Hadoop counters(Stat). When Scalding jobs that use the Stat API are run using the Beam runner, they result in the following error:
Error in job deployment, the FlowProcess for unique id %s isn't available".format(uniqueId)
It looks like currently it is not possible for a runner to be able to provide its own implementation of a stat, because the implementation has a dependency on a Cascading FlowProcess. Here we return a
NullFlowProcess
when a flow process cannot be found in the flow mapping store, instead of erroring out. This has the effect of turning the stat call into a noop, since theNullFlowProcess
does nothing on a call to increment counters.Ideally, we would be able to plug in a Beam counter for Stat. The change I have here may not be ideal, but the goal is to discuss what could be done here, and to understand if returning a NullFlowProcess could have other unintended consequences.