Closed Shafaq-Siddiqi closed 4 years ago
Spark test is comment out due to some fixes required in Frame append (cbind) operation in spark context. The error can be reproduced by running the commented out spark test.
I looked at the bug :bug: you mentioned. I didn't find where to fix it, but I'll leave some more information for somebody to fix it:
I reduced the DML that produces the bug to
F = read($X, data_type="frame", format="csv")
A = cbind(F, as.frame(matrix(1, nrow(F), 1)))
print(toString(A))
To produce the error uncomment the spark test invocation in BuiltinMiceTest.java:49 and replace the content of src/test/scripts/functions/builtin/mice.dml
with the three lines of DML above.
Upon running the test, the check for frame block dimensions in FrameBlock.java:1002 will now fail with org.tugraz.sysds.runtime.DMLRuntimeException: Incompatible number of rows for cbind: 98 (expected: 49)
So the block is split and the column to append is not. This results in the dimension mismatch. This is as far as I got. I didn't find where the split happens. I tried specifying dimensions explicitly in the read() function (that gave other errors, which I'll investigate another time) and in an MTD file. That did not help though :-/ Furthermore, the problem seems to occur only with "real" frame data, not with matrices converted to frames with as.frame().
MICE Nominal for imputing categorical and numerical data. Spark test not included due to some error in cbind operation on frames (Debugging in progress).