Open lvphj opened 7 years ago
Smaller example(without the case study context, not needed to reproduce):
In [45]: df = pd.DataFrame(np.arange(9).reshape(3, 3))
In [46]: df
Out[46]:
0 1 2
0 0 1 2
1 3 4 5
2 6 7 8
In [47]: df.loc[[0,1], [1, 0]]
Out[47]:
1 0
0 1 0
1 4 3
In [48]: df.columns = [1, 0, 'str']
In [49]: df
Out[49]:
1 0 str
0 0 1 2
1 3 4 5
2 6 7 8
In [50]: df.loc[[0,1], [1, 0]] ## <----- this is 'wrong'
Out[50]:
0 1
0 1 0
1 4 3
In [51]: df.loc[:, [1, 0]] ## <----- but it works correctly when index is sliced
Out[51]:
1 0
0 0 1
1 3 4
2 6 7
In [52]: df.columns = [ 0, 1, 'str'] ## and also works when the integers
## in the mixed index are in sorted
In [53]: df
Out[53]:
0 1 str
0 0 1 2
1 3 4 5
2 6 7 8
In [54]: df.loc[[0,1], [1, 0]]
Out[54]:
1 0
0 1 0
1 4 3
cc @toobaz if you are interested
This is a copy of a question I asked at http://stackoverflow.com/questions/43058734/odd-behaviour-when-slicing-pandas-dataframe-with-numeric-column-headings as suggested by a comment.
Consider a Pandas dataframe containing case-control data that can be represented by the following structure:
The caseA and caseN variables represent cases and controls as strings and integers, respectively.
I can calculate a 2x2 table to facilitate the calculation of odds and odds ratios using the pandas crosstab method. The default order of the columns is control-case but I change this to case-control which, to my way of thinking, is a bit more intuitive. (This stage may not be relevant to the issue but illustrates the need for changing the order of the columns.)
I then slice the dataframe to print just a select number of rows with columns in the order case - control. This works exactly as expected.
However, if I add a new column to the dataframe (e.g. a column containing the odds values) and then slice the dataframe in exactly the same way, the cases and controls are printed in the wrong order.
The following code snippet illustrates this point:
On the first run through (with no additional columns added) the sliced table produced is just as expected:
But if you uncomment the code that calculates the odds column and then re-run the exact same code, the sliced table produced has the column order reversed:
However, repeating the process using the case-control data described as strings (as found in variable caseA) produces the correct results, just as expected.
Output of
pd.show_versions()