pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.82k stars 17.99k forks source link

BUG: Shift on a group column when column name is a tuple-of-tuples results in NumPy VisibleDeprecationWarning #35434

Open misantroop opened 4 years ago

misantroop commented 4 years ago

Code Sample, a copy-pastable example

import pandas as pd
import numpy as np

np.warnings.filterwarnings('error', category=np.VisibleDeprecationWarning) 

tuple_column =      ('A', ('B', 2),)
df =                pd.DataFrame({tuple_column: [1]}, index=['q'])
grp =               df.groupby(level=0)
df[tuple_column] =  grp[[tuple_column]].shift()

Problem description

Shifting a group that has a column name as tuple-of-tuples gives VisibleDeprecationWarning.

File "C:\Python\lib\site-packages\pandas\core\groupby\groupby.py", line 2562, in shift return self._get_cythonized_result( File "C:\Python\lib\site-packages\pandas\core\groupby\groupby.py", line 2457, in _get_cythonized_result for idx, obj in enumerate(self._iterate_slices()): File "C:\Python\lib\site-packages\pandas\core\groupby\generic.py", line 998, in _iterate_slices obj = self._selected_obj File "pandas_libs\properties.pyx", line 33, in pandas._libs.properties.CachedProperty.get File "C:\Python\lib\site-packages\pandas\core\groupby\groupby.py", line 641, in _selected_obj return self.obj[self._selection] File "C:\Python\lib\site-packages\pandas\core\frame.py", line 2889, in getitem if com.is_bool_indexer(key): File "C:\Python\lib\site-packages\pandas\core\common.py", line 142, in is_bool_indexer arr = np.asarray(key) File "C:\Python\lib\site-packages\numpy\core_asarray.py", line 83, in asarray return array(a, dtype, copy=False, order=order)

C:\Python\lib\site-packages\numpy\core_asarray.py:83: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray

Expected Output

No VisibleDeprecationWarning triggered.

Output of pd.show_versions()

commit : 6302f7b98ad24adda2d5a98fef3956f04f28039d python : 3.8.5.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.18362 machine : AMD64 processor : Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United Kingdom.1252 pandas : 1.1.0rc0+8.g6302f7b98 numpy : 1.19.1 pytz : 2020.1 dateutil : 2.8.1 pip : 20.1.1 setuptools : 49.2.0 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.5.2 html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : 4.9.1 bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : 3.3.0rc1+439.g7e9530338 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : 0.50.1
simonjayhawkins commented 4 years ago

Thanks @misantroop for the report. This is for NumPy 1.19 onwards?

could you provide a MRE that would be suitable as a test https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

simonjayhawkins commented 4 years ago

xref #31201

misantroop commented 4 years ago

Thanks @misantroop for the report. This is for NumPy 1.19 onwards?

could you provide a MRE that would be suitable as a test https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

Correct, issue does not appear in NumPy 1.18.5. I attempted to conform better to MRE guidelines.

simonjayhawkins commented 4 years ago

The issue can be reproduced with just indexing, so not specific to groupby or shift.

>>> import numpy as np
>>> import pandas as pd
>>>
>>> pd.__version__
'1.1.0rc0+7.g04e9e0afd'
>>>
>>> tup = "A", ("B", 2)
>>>
>>> ser = pd.Series([42], index=[tup])
>>> ser
(A, (B, 2))    42
dtype: int64
>>>
>>> ser[[tup]]
C:\Users\simon\Anaconda3\envs\pandas-dev\lib\site-packages\numpy\core\_asarray.py:83: VisibleDeprecationWarning: Creating an ndarray from ragge
d nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do
 this, you must specify 'dtype=object' when creating the ndarray
  return array(a, dtype, copy=False, order=order)
(A, (B, 2))    42
dtype: int64
>>>
simonjayhawkins commented 4 years ago

xref #24688