The behaviour of the operation 'Row & column calculations -> Basic -> Substract by... -> Column Median' produces unexpected results on the Mac release of v0.4.9.
Reproduction:
Using the "fixed acidity" column from "winequality-white.csv" perform two sets of operations (all within 'Row & column calculations -> Basic'
A (divide-then-log):
'Divide by... -> Column Median'
'Logarithmic -> log2'
B (log-then-subtract):
'Logarithmic -> log2'
'Substract by... -> Column Median' operation
Expected outcome: the results of A and B should be identical. The medians of A and B should be 0.
Observed outcome: B = A - ~4.03
Scatter plot:
When I perform the the log-then-subtract sequence but instead manually input the median value (obtained from 'Summary Statistics -> 50%') by using 'Substract by... -> Value' rather than 'Substract by... -> Column Median', then the result is identical to A.
It looks like you might have a bug in analyze_data.py:1164 - you do not subset the list of median values before zipping with selectedColumns, so it will always pair the first value from selectedColumns with the median of the first column in the entire data frame.
P.S. 'Substract' is (typically considered) a typo, with the correct spelling being 'Subtract'.
The behaviour of the operation 'Row & column calculations -> Basic -> Substract by... -> Column Median' produces unexpected results on the Mac release of v0.4.9.
Reproduction: Using the "fixed acidity" column from "winequality-white.csv" perform two sets of operations (all within 'Row & column calculations -> Basic' A (divide-then-log):
Expected outcome: the results of A and B should be identical. The medians of A and B should be 0. Observed outcome: B = A - ~4.03
Scatter plot:
When I perform the the log-then-subtract sequence but instead manually input the median value (obtained from 'Summary Statistics -> 50%') by using 'Substract by... -> Value' rather than 'Substract by... -> Column Median', then the result is identical to A.
It looks like you might have a bug in analyze_data.py:1164 - you do not subset the list of median values before zipping with selectedColumns, so it will always pair the first value from selectedColumns with the median of the first column in the entire data frame.
P.S. 'Substract' is (typically considered) a typo, with the correct spelling being 'Subtract'.