pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.4k stars 17.83k forks source link

Asymmetric behavior between index and columns when getting incomplete label #17029

Open toobaz opened 7 years ago

toobaz commented 7 years ago

Code Sample, a copy-pastable example if possible

In [2]: df = pd.DataFrame([[1,2], [3,4]], index=pd.MultiIndex.from_tuples([['a', 'b'], ['c', '']]))

In [3]: df.loc['c'].shape
Out[3]: (1, 2)

In [4]: df.transpose().loc[:, 'c'].shape
Out[4]: (2,)

Problem description

Maybe the "fill an incomplete key with empty string(s)" rule is not implemented at all for rows? (also in light of #17024 ) If this the case, then I think it should be.

Expected Output

The same as Out[4] but reversed.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: 9e7666dae3b3b10d987ce154a51c78bcee6e0728 python: 3.5.3.final.0 python-bits: 64 OS: Linux OS-release: 4.9.0-3-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: it_IT.UTF-8 LOCALE: it_IT.UTF-8 pandas: 0.21.0.dev+265.g9e7666dae pytest: 3.0.6 pip: 9.0.1 setuptools: None Cython: 0.25.2 numpy: 1.12.1 scipy: 0.19.0 xarray: None IPython: 5.1.0.dev sphinx: 1.5.6 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: 1.2.1 tables: 3.3.0 numexpr: 2.6.1 feather: 0.3.1 matplotlib: 2.0.2 openpyxl: None xlrd: 1.0.0 xlwt: 1.1.2 xlsxwriter: 0.9.6 lxml: None bs4: 4.5.3 html5lib: 0.999999999 sqlalchemy: 1.0.15 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: 0.2.1
gfyoung commented 7 years ago

@toobaz : I think this makes sense to me. Why would you expect the shape to be same if you transposed?

chris-b1 commented 7 years ago

@toobaz - agree with your diagnosis this is most likely due to empty-string level dropping magic, xref #11424. Probably could be made consistent.

gfyoung commented 7 years ago

@chris-b1 : Judging from your response, I'm labeling this as an API issue. I'm not sure I follow the expected out description by @toobaz . Could you explain?

chris-b1 commented 7 years ago

Sure, our basic behavior is that indexing operations that are "slice-like" (e.g. selecting an entire level) on a MultiIndex return back a DataFrame. Couple examples:

In [4]: idx = pd.MultiIndex.from_tuples([('a', ''), ('b', '1'), ('c', '1'), ('c', '2')])

In [5]: df = pd.DataFrame(np.arange(16).reshape(4,4), index=idx, columns=idx)

In [6]: df
Out[6]: 
      a   b   c    
          1   1   2
a     0   1   2   3
b 1   4   5   6   7
c 1   8   9  10  11
  2  12  13  14  15

In [7]: type(df.loc['b', :])
Out[7]: pandas.core.frame.DataFrame

In [8]: type(df.loc['c', :])
Out[8]: pandas.core.frame.DataFrame

In [9]: type(df.loc[:, 'b'])
Out[9]: pandas.core.frame.DataFrame

In [10]: type(df.loc[:, 'c'])
Out[10]: pandas.core.frame.DataFrame

But, as an undocumented "convenience" feature (linked issue), if the selection is on the columns, and all deeper levels are labeled with empty strings, the selection collapses into a Series - this collapsing doesn't happen with a row selection (this issue)

In [12]: df.loc[:, 'b']
Out[12]: 
      1
a     1
b 1   5
c 1   9
  2  13

In [13]: df.loc[:, 'a']
Out[13]: 
a        0
b  1     4
c  1     8
   2    12
Name: a, dtype: int32

In [16]: type(df.loc[:, 'a'])
Out[16]: pandas.core.series.Series

In [17]: df.loc['a', :]
Out[17]: 
  a  b  c   
     1  1  2
  0  1  2  3

In [18]: type(df.loc['a', :])
Out[18]: pandas.core.frame.DataFrame
gfyoung commented 7 years ago

@chris-b1 : Awesome! That definitely explained it and then some. I think I got confused by the description of the expected output. The expected shape is just the dimensions reversed (it's a transposition).

toobaz commented 7 years ago

The expected shape is just the dimensions reversed (it's a transposition).

My example was maybe a bit cryptic, sorry. The thing is that a shape (1,2) when transposed gives (2,1), not (2,).