Closed akunft closed 8 years ago
I am looking into the flink and hybrid_flink execution modes, @carabolic is working on the necessary instructions.
I added a test that allows us to run the Linear Regression DML script (direct solver). For hybrid_flink
mode, the LinearRegDS.dml
script already works (only uses Flink reblock instructions).
I think we should also get the GLM-predict.dml
script to run for our end-to-end example.
This script uses (in hybrid_spark
mode) a couple more instructions that are not yet implemented in Flink:
The other instructions are MatrixScalarArithmeticInstructions ("*" and "/") that we should already have. We might need to add the ArithmeticInstruction abstraction similar to Spark.
I think getting this to work in hybrid_flink
mode should be the first step. We can then add the instruction for pure flink
mode.
It turns out that the number of Flink instructions increases significantly during recompilation for the GLM-predict.dml
script (matrix-indexing, relationalbinary, ...)
Should we add these or make the PR only for the LinearReg*.dml
scripts?
it could actually be a bug that I introduced... I am investigating! :eyeglasses:
Unfortunately I think it's not a bug, same happens for Spark. So we can either implement all missing instructions for the GLM-predict.dml
or have a PR for only the other scripts...
I think this is done and we should focus on testing and cleanup now. One thing that we should resolve for the PR is #15 - when running in hybrid_flink
mode on a cluster this will probably be needed.
In order to create the first PR, we should identify the required implementation work for an end-to-end execution of LinregCG / LinregCG and discuss how to split the work here.
It would be best if both of you @fschueler & @carabolic have a look at this (I can only assign one :).