tugraz-isds / systemds

An open source ML system for the end-to-end data science lifecycle
Apache License 2.0
37 stars 20 forks source link

Add Rewrites RemoveEmpty & CTable #118

Closed sebwrede closed 4 years ago

sebwrede commented 4 years ago

Tasks

1) Remove Unnecessary RemoveEmpty

The removeEmpty operation removes empty rows or columns. Please add a generic rewrite for SUM and SUM_SQ and all directions, but only apply the rewrite if the optional 'select' vector is not provided.

sum(removeEmpty(target=X)) -> sum(X)

rowSums(removeEmpty(target=X,margin="cols")) -> rowSums(X)

colSums(removeEmpty(target=X,margin="rows")) -> colSums(X)

2) Remove Unnecessary CTable

The table function refers to contingency tables and please only apply this rewrite for unary or binary ctable operations (without weights).

sum(table(X, 1)) -> nrow(X)

sum(table(1, Y)) -> nrow(Y)

sum(table(X, Y)) -> nrow(X)
mboehm7 commented 4 years ago

LGTM - thanks @sebwrede for the patch, this is great. I only made few modifications regarding formatting (tabs over spaces, indentation) and fixes one rewrite issues. The rewrite removeUnnecessaryRemoveEmpty did not yet check for margin/agg-direction combinations that in fact require a removeEmpty (this was likely due to missing negative tests where the rewrites should not be applied).