Closed kmdalton closed 2 years ago
On second thought, DataFrame.merge(how='outer')
is supposed to sort keys lexicographically. So, that is hopefully not the culprit.
What version of pandas were you using here? I can reproduce this bug, but I do not have any stochasticity on my system (using the latest pandas v1.3.3).
In my mind, this is a bug in unstack_anomalous()
, not stack_anomalous()
. Your snippet can be made to work for me with the following change to the last line:
plus_labels = ["F(+)", "loc(+)", "scale(+)"]
minus_labels = ["F(-)", "loc(-)", "scale(-)"]
unstacked.stack_anomalous(plus_labels=plus_labels, minus_labels=minus_labels)
to explicitly provide the columns in the expected order. As per the docstring, stack_anomalous()
expects the corresponding column labels to be given in the same order. However, unstack_anomalous()
should be made to always output the column labels in a way that would be compatible with stack_anomalous()
My pandas version:
[ins] In [1]: import pandas as pd
[ins] In [2]: pd.__version__
Out[2]: '1.3.2'
I agree that the bug is in unstack_anomalous
. Sorry if that wasn't clear.
Calling
DataSet.unstack_anomalous
followed byDataSet.stack_anomalous
does not always work inrs
version0.9.15
. The following code verifies that this fails sometimes and succeeds others.I find that the order of columns in
unstacked
is not always the same, despite consistent column ordering inds
. When the column order isIndex(['F(+)', 'loc(+)', 'scale(+)', 'F(-)', 'loc(-)', 'scale(-)'], dtype='object')
, the script succeeds. When the column order isIndex(['F(+)', 'loc(+)', 'scale(+)', 'scale(-)', 'loc(-)', 'F(-)'], dtype='object')
, it fails with the following traceback:I don't know where this stochasticity is coming from, but it is probably somewhere in
DataSet.stack_anomalous
. My guess would be it has something to do with the (non?)determinism of pd.DataFrame.merge. I don't see any obvious place where the column order could be getting scrambled.