Open micky-gee opened 4 days ago
Adding what I've found from some more digging, I've found the call within the multilevel index that is failing:
>>> index._engine.get_loc(('A', None))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "index.pyx", line 776, in pandas._libs.index.BaseMultiIndexCodesEngine.get_loc
File "index.pyx", line 167, in pandas._libs.index.IndexEngine.get_loc
File "index.pyx", line 196, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 2152, in pandas._libs.hashtable.UInt64HashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 2176, in pandas._libs.hashtable.UInt64HashTable.get_item
KeyError: 17
I think that this has to do with the hashing of the None type and converting that to an address on the underlying data structure?
When I give a valid tuple to the multilevel index, I get an integer corresponding to an entry in an underlying datastructure:
>>>index._engine.get_loc(('A', 'a2'))
1
As part of trying to understand this problem more broadly, I've been investigating hashable types (None and NaN are hashable) and their usability in indices with Pandas.
As a single level index (opposed to a multilevel index), here is an MWE that demonstrates these inconsistencies:
>>> import pandas as pd
>>> import numpy as np
>>> index2 = pd.Index([1, 2, 3, None])
>>> df2 = pd.DataFrame([4, 5, 6, 9], index=index2)
>>> df2
0
1.0 4
2.0 5
3.0 6
NaN 9
Now addressing the index entry with None results in a key error:
>>> df2.loc[None]
Traceback (most recent call last):
File "/Users/michaelgrant/.pyenv/versions/s7s_strategy_private/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3805, in get_loc
return self._engine.get_loc(casted_key)
File "index.pyx", line 167, in pandas._libs.index.IndexEngine.get_loc
File "index.pyx", line 175, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index_class_helper.pxi", line 19, in pandas._libs.index.Float64Engine._check_type
KeyError: None
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/michaelgrant/.pyenv/versions/s7s_strategy_private/lib/python3.10/site-packages/pandas/core/indexing.py", line 1191, in __getitem__
return self._getitem_axis(maybe_callable, axis=axis)
File "/Users/michaelgrant/.pyenv/versions/s7s_strategy_private/lib/python3.10/site-packages/pandas/core/indexing.py", line 1431, in _getitem_axis
return self._get_label(key, axis=axis)
File "/Users/michaelgrant/.pyenv/versions/s7s_strategy_private/lib/python3.10/site-packages/pandas/core/indexing.py", line 1381, in _get_label
return self.obj.xs(label, axis=axis)
File "/Users/michaelgrant/.pyenv/versions/s7s_strategy_private/lib/python3.10/site-packages/pandas/core/generic.py", line 4301, in xs
loc = index.get_loc(key)
File "/Users/michaelgrant/.pyenv/versions/s7s_strategy_private/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3812, in get_loc
raise KeyError(key) from err
KeyError: None
However replacing None
with np.nan
works just fine:
>>> df2.loc[np.nan]
0 9
Name: nan, dtype: int64
Pandas version checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
It is possible to enlarge a dataframe with a multilevel indexes by providing the new key as parameters to df.loc[...]
It is also possible to create entries to multilevel indices that have None as the key i.e. df.loc[('A', None),...]
It is not possible to enlarge a dataframe with a multilevel index if one or more of the keys is None.
Expected Behavior
Building on the example above,
df.loc[('A', None),:] = [12, 13]
should result in the following:
Installed Versions