Open sekruse opened 7 years ago
It seems that the ExecutionTaskFlow
is already not assembled correctly, as the following log message suggests:
[WARN] org.qcri.rheem.core.optimizer.enumeration.ExecutionTaskFlow - T[JavaMap[1+1->1, id=415b0b49]] has missing input channels among [null, CollectionChannel[T[JavaCollectionSource[0->1, id=4f49f6af]]->[T[JavaMap[1+1->1, id=415b0b49]]]]].
In this instance, only the broadcast is registered correctly, but the regular map input is missing.
I started working on this issue in branch rheem-44.
I pinpointed that both the optimizer and executor assume that each input channel is only fed once into each operator. Above code does break this assumption. I added changes to make the optimizer aware of the possibility of accessing an input channel twice. However, above test still fails during the execution (in the maintenance of the execution lineage). I stop working on this now for two reasons:
So, please feel free to pick up this issue if you are feeling like it. :wink:
The following code
produces this error
Apparently, this problem appears because
inputDataQuanta
and themap
call are connected twice: via the regular data flow and via the broadcast. If one inserts amap(x->x)
before the map call or before broadcasting, the example works fine.The above test can be used to reproduce the bug and should be fixed.