stratosphere / incubator-systemml

Mirror of Apache SystemML (Incubating)
Apache License 2.0
1 stars 4 forks source link

ExecutionEnvironment.execute() should not be called when no sinks are defined. #15

Open fschueler opened 8 years ago

fschueler commented 8 years ago

Currently we call env.execute() after the execution of all program blocks if the ExecutionContext is of type FlinkExecutionContext. In hybrid_flink mode this leads to the problem that execute() can be called even though no sinks are defined.

We should somehow check if an adequate plan exists before calling execute().

fschueler commented 8 years ago

This can not only happen in hybrid_flink mode but whenever collect() is called somewhere in between. Anyone has an idea how we can keep track of the defined sources/sinks?

aalexandrov commented 8 years ago

You can use createProgramPlan and inspect the result for sinks.

aalexandrov commented 8 years ago

Alternatively, use the ExecutionEnvironment wrapper class using the delegate pattern and override the void registerDataSink(DataSink<?> sink) { method in order to keep track of this kind of meta-information.

fschueler commented 8 years ago

Unfortunately, createProgramPlan already fails if no sinks are defined and the variable sinks of the ExecutionEnvironment is private. I guess we will have to go with the alternative.

fschueler commented 8 years ago

Mabe the "final" execute (plus check) could happen in the finally block of the DMLScript in line 685, where the SparkContext is stopped.