Closed ojdo closed 10 years ago
this has to do with some pretty low-level code that find whether the label you suppy is an indexer. it then returns a scalar or a slice depending.
For some reason in the first example it is returning a slice, but the second a scalar. Not really sure why. I'd like to reproduce this, do you have the code to generate before the pickle? I know its weird but not sure if this is an impl issue, a bug, or just not guaranteed.
you can guarantee the result by doing this:
df.loc[[(1,199)],'Elec']
which will always return a Frame
actually this is a 'user' issue.
df.index.is_unique
is False
df2.index.is_unique
is True
that said still might be a bug
So this reproduces.
The question is, if you select from a multi-index that ONLY selects a unique value even though the index is non-unique, should it be treated like selecting from a unique multi-index. (Currently this is NOT true in general; if you have a DataFrame with non-unique columns selecting a single-column gets you back a DataFrame, and not a Series)
In [3]: df = DataFrame(dict(value = [0,1,2]),index=MultiIndex.from_tuples([(1,1),(1,2),(1,2)]))
In [4]: df2 = DataFrame(dict(value = [0,1,2]),index=MultiIndex.from_tuples([(1,1),(1,2),(1,3)]))
In [5]: df
Out[5]:
value
1 1 0
2 1
2 2
In [6]: df2
Out[6]:
value
1 1 0
2 1
3 2
In [7]: df.loc[(1,1),'value']
Out[7]:
1 1 0
Name: value, dtype: int64
In [8]: df2.loc[(1,1),'value']
Out[8]: 0
In [9]: df.loc[(1,2),'value']
Out[9]:
1 2 1
2 2
Name: value, dtype: int64
Ouch, thank you for spotting this. I think, in that case the .loc
function is not to blame, but actually helpful by ensuring consistent return types throughout a DataFrame.
What I would consider a bug, though, is that for your reproducing example, df.at
fails to access row (1,1)
, even though this row is unique in both DataFrames:
In [10]: df.at[(1,1),'value']
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-10-15c883613e3d> in <module>()
----> 1 df.at[(1,1),'value']
C:\Python27\lib\site-packages\pandas\core\indexing.pyc in __getitem__(self, key)
1264
1265 key = self._convert_key(key)
-> 1266 return self.obj.get_value(*key)
1267
1268 def __setitem__(self, key, value):
C:\Python27\lib\site-packages\pandas\core\frame.pyc in get_value(self, index, col)
1526 series = self._get_item_cache(col)
1527 engine = self.index._engine
-> 1528 return engine.get_value(series.values, index)
1529
1530 def set_value(self, index, col, value):
C:\Python27\lib\site-packages\pandas\index.pyd in pandas.index.IndexEngine.get_value (pandas\index.c:2957)()
C:\Python27\lib\site-packages\pandas\index.pyd in pandas.index.IndexEngine.get_value (pandas\index.c:2772)()
C:\Python27\lib\site-packages\pandas\index.pyd in pandas.index.IndexEngine.get_loc (pandas\index.c:3451)()
C:\Python27\lib\site-packages\pandas\index.pyd in pandas.index.IndexEngine._get_loc_duplicates (pandas\index.c:3747)()
TypeError: only integer arrays with one element can be converted to an index
Maybe the error message could hint something along df.index.is_unique == True
is required for successful single-element access?
yes that last is prob a bug
Just to be sure: The bug seems to be "hidden" in a low-level part that I cannot reach with the Python debugger, right? When trigger post-mortem %debug
in IPython, I cannot step into engine.get_value(series.values, index)
to find out what's going on down there. (Bonus question: would 'debugging c extensions' be the right keywords to look for how to debug these parts as well?)
Other than that, would it help if I prepare the reproducing example as a new test case pull request?
its tricky to debug the cython, I generally just insert print statements as needed.
But that's not really the issue, it was calling a different routine depending on if its unique or not. This is correct (I mean you can argue that the non-unique case that only returns a single value needs special treatmenet when the index is non-unique), but that's a different (and API issue).
you can certainly do a pull-request to fix the .at
issue (which is what this issue now represents). Put in the test cases, see where it fails and fix.
would be great. the indexing code has a lot of paths, but debugging is actually straightforward once you do it a few times.
I think I may have a similar issue... not sure if its the same bug or a different one, let me know if it belongs in a different issue. T
df = pd.DataFrame([[1,2,3],[4,5,6],[7,8,9]])
df.columns = pd.MultiIndex.from_tuples([(0,1),(1,1),(2,1)])
df.groupby(axis=1, level=[0,1]).first()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-8-5240a9c3bdf4> in <module>()
1 df = pd.DataFrame([[1,2,3],[4,5,6],[7,8,9]])
2 df.index = pd.MultiIndex.from_tuples([(0,1),(1,1),(2,1)])
----> 3 df.T.groupby(axis=1, level=[0,1]).first()
/cellar/users/agross/anaconda2/lib/python2.7/site-packages/pandas-0.14.1.dev-py2.7-linux-x86_64.egg/pandas/core/groupby.pyc in f(self)
109 raise SpecificationError(str(e))
110 except Exception:
--> 111 result = self.aggregate(lambda x: npfunc(x, axis=self.axis))
112 if _convert:
113 result = result.convert_objects()
/cellar/users/agross/anaconda2/lib/python2.7/site-packages/pandas-0.14.1.dev-py2.7-linux-x86_64.egg/pandas/core/groupby.pyc in aggregate(self, arg, *args, **kwargs)
2528
2529 if self.grouper.nkeys > 1:
-> 2530 return self._python_agg_general(arg, *args, **kwargs)
2531 else:
2532
/cellar/users/agross/anaconda2/lib/python2.7/site-packages/pandas-0.14.1.dev-py2.7-linux-x86_64.egg/pandas/core/groupby.pyc in _python_agg_general(self, func, *args, **kwargs)
1081 output[name] = self._try_cast(values[mask], result)
1082
-> 1083 return self._wrap_aggregated_output(output)
1084
1085 def _wrap_applied_output(self, *args, **kwargs):
/cellar/users/agross/anaconda2/lib/python2.7/site-packages/pandas-0.14.1.dev-py2.7-linux-x86_64.egg/pandas/core/groupby.pyc in _wrap_aggregated_output(self, output, names)
3087 result = result.T
3088
-> 3089 return self._reindex_output(result).convert_objects()
3090
3091 def _wrap_agged_blocks(self, items, blocks):
/cellar/users/agross/anaconda2/lib/python2.7/site-packages/pandas-0.14.1.dev-py2.7-linux-x86_64.egg/pandas/core/groupby.pyc in _reindex_output(self, result)
3129 levels_list = [ ping._group_index for ping in groupings ]
3130 index = MultiIndex.from_product(levels_list, names=self.grouper.names)
-> 3131 return result.reindex(**{ self.obj._get_axis_name(self.axis) : index, 'copy' : False }).sortlevel()
3132
3133 def _iterate_column_groupbys(self):
/cellar/users/agross/anaconda2/lib/python2.7/site-packages/pandas-0.14.1.dev-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in sortlevel(self, level, axis, ascending, inplace, sort_remaining)
2811 the_axis = self._get_axis(axis)
2812 if not isinstance(the_axis, MultiIndex):
-> 2813 raise TypeError('can only sort by level with a hierarchical index')
2814
2815 new_axis, indexer = the_axis.sortlevel(level, ascending=ascending,
TypeError: can only sort by level with a hierarchical index
works fine in 0.14.1
In [38]: df = pd.DataFrame([[1,2,3],[4,5,6],[7,8,9]])
In [39]: df.columns = pd.MultiIndex.from_tuples([(0,1),(1,1),(2,1)])
In [40]: df.groupby(axis=1, level=[0,1]).first()
Out[40]:
0 1 2
1 1 1
0 1 2 3
1 4 5 6
2 7 8 9
I was on master... switched to 0.14.1 and it works for me as well. Must be a recent thing.
hmm, that IS broken in master, weird. can you open a separate issue for that. thanks!
bug posted here: https://github.com/pydata/pandas/issues/7997
ok the primary is a usage question, bug reported in #7997
I fear that I somehow created a DataFrame with a numeric MultiIndex that triggers (un?)intended behaviour in the
.loc
function. I failed to create a reproducible example with anything but a pickled dump of a DataFrame.Question Is this a bug, or a user error? I'm confused...
Steps to reproduce
http://ojdo.de/tmp/df.pickle
(17 kB)Resulting output
Expected output
The value on its own:
7.602552
It get's weirder
There must be something between row 80 and 90 in this DataFrame, because the following snippet yields a single value, while returing a Series if executed with
.head(90)
.Installed versions
commit: None python: 2.7.0.final.0 python-bits: 32 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel byteorder: little LC_ALL: None LANG: None
pandas: 0.14.1 nose: 1.3.3 Cython: None numpy: 1.8.1 scipy: 0.12.0 statsmodels: None IPython: 0.13.2 sphinx: None patsy: None scikits.timeseries: None dateutil: 1.5-mpl pytz: 2012d bottleneck: None tables: 3.1.1 numexpr: 2.4 matplotlib: 1.2.1 openpyxl: 2.0.2 xlrd: 0.9.2 xlwt: 0.7.5 xlsxwriter: 0.5.5 lxml: 3.3.5 bs4: 4.3.2 html5lib: 1.0b3 httplib2: None apiclient: None rpy2: None sqlalchemy: None pymysql: None psycopg2: None