pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.39k stars 17.83k forks source link

appending with multi-level series converts to single-level series #13465

Closed abremod closed 2 weeks ago

abremod commented 8 years ago

This was an issue that was fixed in 0.14.0, but has re-appeared in 0.18.1. The series s should have a mulit-level index, but in 18.1 it has a single-level index with a tuple for each value in the index.

Code Sample, a copy-pastable example if possible

import pandas as pd
from numpy.random import randn
a = [['bar', 'bar', 'foo', 'foo'],['one', 'two', 'one', 'two']]
t = list(zip(*a))
ind = pd.MultiIndex.from_tuples(t, names=['first', 'second'])
dat = pd.Series(randn(4), index=ind) 
s = pd.Series()
s = s.append(dat)
s.index.levels

Expected Output

FrozenList([['bar', 'foo'], ['one', 'two']])

output of pd.show_versions()

commit: None python: 2.7.11.final.0 python-bits: 64 OS: Darwin OS-release: 15.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8

pandas: 0.18.1 nose: 1.3.7 pip: 8.1.2 setuptools: 22.0.5 Cython: 0.24 numpy: 1.11.0 scipy: 0.17.1 statsmodels: 0.6.1 xarray: None IPython: 4.2.0 sphinx: 1.4.1 patsy: 0.4.1 dateutil: 2.5.3 pytz: 2016.4 blosc: None bottleneck: 1.0.0 tables: 3.2.2 numexpr: 2.5.2 matplotlib: 1.5.1 openpyxl: 2.3.2 xlrd: 1.0.0 xlwt: 1.1.1 xlsxwriter: 0.8.9 lxml: 3.6.0 bs4: 4.4.1 html5lib: None httplib2: None apiclient: None sqlalchemy: 1.0.13 pymysql: None psycopg2: None jinja2: 2.8 boto: 2.40.0 pandas_datareader: None

jreback commented 8 years ago

So this is correct; the indexes of an empty Index does not align with a MultiIndex, so the multi-index is coerced to an Index (hence the tuples).

In [16]: Series(dat).append(dat)
Out[16]: 
first  second
bar    one       0.924763
       two       1.307207
foo    one      -0.811859
       two       0.381771
bar    one       0.924763
       two       1.307207
foo    one      -0.811859
       two       0.381771
dtype: float64

In [17]: Series().append(dat)
Out[17]: 
(bar, one)    0.924763
(bar, two)    1.307207
(foo, one)   -0.811859
(foo, two)    0.381771
dtype: float64

In [18]: Series().index
Out[18]: Index([], dtype='object')

I suppose this is a bit unexpected though and probably not the intent.

If you want to do a pull-request to conform it and if it doesn't break anything else, ok.

jorisvandenbossche commented 8 years ago

But it is correct that it is a regression from 0.18.0 to 0.18.1

jreback commented 8 years ago

ok possibly #12195, though more likely the various cleanups in Index w/empties are culprit.

mroeschke commented 2 weeks ago

.append has been deprecated and removed so not sure if this is still actionable so closing