twitter / scalding

A Scala API for Cascading
http://twitter.com/scalding
Apache License 2.0
3.49k stars 703 forks source link

planner exception with merge #1837

Closed johnynek closed 6 years ago

johnynek commented 6 years ago

this looks like an error in the new implementation.

Caused by: cascading.flow.planner.PlannerException: could not build flow from assembly: [[_pipe_0-464d65c76b6f+_...][planTypedWrites() @ com.twitter.scalding.Job.validate(Job.sca
la:310)] merged streams must declare the same field names, in the same order, expected: [{1}:0] found: [{?}:UNKNOWN]]
        at cascading.flow.planner.FlowPlanner.handleExceptionDuringPlanning(FlowPlanner.java:578)
        at cascading.flow.hadoop.planner.HadoopPlanner.buildFlow(HadoopPlanner.java:286)
        at cascading.flow.hadoop.planner.HadoopPlanner.buildFlow(HadoopPlanner.java:80)
        at cascading.flow.FlowConnector.connect(FlowConnector.java:459)
        at com.twitter.scalding.ExecutionContext$class.buildFlow(ExecutionContext.scala:86)
        at com.twitter.scalding.ExecutionContext$$anon$1.buildFlow(ExecutionContext.scala:179)
        at com.twitter.scalding.Job$$anonfun$buildFlow$1.apply(Job.scala:287)
        at com.twitter.scalding.Job$$anonfun$buildFlow$1.apply(Job.scala:287)
        at scala.util.Success.flatMap(Try.scala:231)
        at com.twitter.scalding.Job.buildFlow(Job.scala:287)
        at com.stripe.zoolander.parquet.ParquetJob$class.run(ParquetJob.scala:143)
        at com.stripe.zoolander.treasury.LedgerClearingTimeseries.run(LedgerClearingTimeseries.scala:16)
        at com.twitter.scalding.Tool.start$1(Tool.scala:124)
        at com.twitter.scalding.Tool.run(Tool.scala:141)
        at com.twitter.scalding.Tool.run(Tool.scala:68)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at com.twitter.scalding.Tool$.main(Tool.scala:149)
        ... 1 more
Caused by: cascading.pipe.OperatorException: [_pipe_0-464d65c76b6f+_...][planTypedWrites() @ com.twitter.scalding.Job.validate(Job.scala:310)] merged streams must declare the sam
e field names, in the same order, expected: [{1}:0] found: [{?}:UNKNOWN]
        at cascading.pipe.Splice.resolveDeclared(Splice.java:1276)
        at cascading.pipe.Splice.outgoingScopeFor(Splice.java:992)
        at cascading.flow.planner.ElementGraph.resolveFields(ElementGraph.java:628)
        at cascading.flow.planner.ElementGraph.resolveFields(ElementGraph.java:610)
        at cascading.flow.hadoop.planner.HadoopPlanner.buildFlow(HadoopPlanner.java:270)
        ... 16 more

Seems like we are somehow merging without being sure they share the same fields.

We need a repro, and then it should be easy to fix.