pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.82k stars 17.99k forks source link

BUG: Error while debugging #46890

Closed edvos-sw closed 2 years ago

edvos-sw commented 2 years ago

Pandas version checks

Reproducible Example

import pandas as pd
import numpy as np

df = pd.DataFrame()
df['col1'] = np.random.rand(100)
df['col2'] = np.random.rand(100)

df.to_parquet('test.parquet')

df = pd.read_parquet('test.parquet')

print(df)

Issue Description

IMPORTANT: This only happens when debugging on line: pd.read_parquet('test.parquet') I am using spyder on anaconda. I can provide dependencies if necessary.

Traceback (most recent call last): File "C:\Users\edudv\miniconda3\envs\test_pd\lib\site-packages\spyder_kernels\customize\spyderpdb.py", line 776, in run super(SpyderPdb, self).run(cmd, globals, locals) File "C:\Users\edudv\miniconda3\envs\test_pd\lib\bdb.py", line 597, in run exec(cmd, globals, locals) File "c:\users\edudv\downloads\test.py", line 16, in df = pd.read_parquet('test.parquet') File "C:\Users\edudv\miniconda3\envs\test_pd\lib\site-packages\pandas\io\parquet.py", line 493, in read_parquet return impl.read( File "C:\Users\edudv\miniconda3\envs\test_pd\lib\site-packages\pandas\io\parquet.py", line 240, in read result = self.api.parquet.read_table( File "pyarrow\array.pxi", line 767, in pyarrow.lib._PandasConvertible.to_pandas File "pyarrow\table.pxi", line 1996, in pyarrow.lib.Table._to_pandas File "C:\Users\edudv\miniconda3\envs\test_pd\lib\site-packages\pyarrow\pandas_compat.py", line 788, in table_to_blockmanager columns = _deserialize_column_index(table, all_columns, column_indexes) File "C:\Users\edudv\miniconda3\envs\test_pd\lib\site-packages\pyarrow\pandas_compat.py", line 903, in _deserialize_column_index columns = _flatten_single_level_multiindex(columns) File "C:\Users\edudv\miniconda3\envs\test_pd\lib\site-packages\pyarrow\pandas_compat.py", line 1150, in _flatten_single_level_multiindex if not index.is_unique: File "pandas\_libs\properties.pyx", line 37, in pandas._libs.properties.CachedProperty.__get__ File "C:\Users\edudv\miniconda3\envs\test_pd\lib\site-packages\pandas\core\indexes\base.py", line 2237, in is_unique return self._engine.is_unique File "pandas\_libs\properties.pyx", line 37, in pandas._libs.properties.CachedProperty.__get__ File "C:\Users\edudv\miniconda3\envs\test_pd\lib\site-packages\pandas\core\indexes\multi.py", line 1097, in _engine return MultiIndexUIntEngine(self.levels, self.codes, offsets) File "pandas\_libs\index.pyx", line 635, in pandas._libs.index.BaseMultiIndexCodesEngine.__init__ File "C:\Users\edudv\miniconda3\envs\test_pd\lib\site-packages\pandas\core\indexes\multi.py", line 136, in _codes_to_ints codes <<= self.offsets AttributeError: 'MultiIndex' object has no attribute 'offsets'

Expected Behavior

Read parquet file

Installed Versions

INSTALLED VERSIONS ------------------ commit : 4bfe3d07b4858144c219b9346329027024102ab6 python : 3.10.4.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19044 machine : AMD64 processor : Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder : little LC_ALL : None LANG : en LOCALE : es_ES.cp1252 pandas : 1.4.2 numpy : 1.22.3 pytz : 2022.1 dateutil : 2.8.2 pip : 22.0.4 setuptools : 62.1.0 Cython : None pytest : None hypothesis : None sphinx : 4.5.0 blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.1 IPython : 7.32.0 pandas_datareader: None bs4 : 4.11.1 bottleneck : None brotli : fastparquet : None fsspec : None gcsfs : None markupsafe : 2.1.1 matplotlib : 3.5.1 numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 7.0.0 pyreadstat : None pyxlsb : None s3fs : None scipy : 1.8.0 snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : None C:\Users\edudv\miniconda3\envs\test_pd\lib\site-packages\_distutils_hack\__init__.py:30: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.")
RyuuOujiXS commented 2 years ago

Is there a reason you need to read from a file you just overwrote? Why not just use the df that's already in memory since it should be identical? Asking because code is built around real-use cases.

edvos-sw commented 2 years ago

The code was just for recreation purpose. It happens when I read any parquet file

El lun., 2 may. 2022 23:44, RyuuOujiXS @.***> escribió:

Is there a reason you need to read from a file you just overwrote? Why not just use the df that's already in memory since it should be identical? Asking because code is built around real-use cases.

— Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/46890#issuecomment-1115399850, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOD3ZKTFPM7WKZCP3PKQCFLVIBECJANCNFSM5USH2F6A . You are receiving this because you authored the thread.Message ID: @.***>

ghost commented 2 years ago

I have the same issue: appending a column to the index works fine while running, but fails when in debug mode. I'm using Spyder 5.3.0 on Windows with pandas 1.4.2.

I've created some dummy code that shows the problem:

import pandas as pd

df = pd.DataFrame({"a": [1, 2, 3], "b": [100, 200, 300], "c": ["a", "b", "c"]})

df.set_index("a", inplace=True)
df.set_index("b", append=True, inplace=True)

print(df)
print(df.index)

Running this without debugging returns ✔️ :

       c
a b     
1 100  a
2 200  b
3 300  c
MultiIndex([(1, 100),
            (2, 200),
            (3, 300)],
           names=['a', 'b'])

Running this with debugging in Spyder returns ❌ :

Traceback (most recent call last):
  File "C:\Users\username\Miniconda3\envs\some-env\lib\site-packages\spyder_kernels\customize\spyderpdb.py", line 776, in run
    super(SpyderPdb, self).run(cmd, globals, locals)
  File "C:\Users\username\Miniconda3\envs\some-env\lib\bdb.py", line 597, in run
    exec(cmd, globals, locals)
  File "c:\users\username\path\temp.py", line 6, in <module>
    df.set_index("b", append=True, inplace=True)
  File "C:\Users\username\Miniconda3\envs\some-env\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\username\Miniconda3\envs\some-env\lib\site-packages\pandas\core\frame.py", line 5560, in set_index
    index._cleanup()
  File "C:\Users\username\Miniconda3\envs\some-env\lib\site-packages\pandas\core\indexes\base.py", line 843, in _cleanup
    self._engine.clear_mapping()
  File "pandas\_libs\properties.pyx", line 37, in pandas._libs.properties.CachedProperty.__get__
  File "C:\Users\username\Miniconda3\envs\some-env\lib\site-packages\pandas\core\indexes\multi.py", line 1097, in _engine
    return MultiIndexUIntEngine(self.levels, self.codes, offsets)
  File "pandas\_libs\index.pyx", line 635, in pandas._libs.index.BaseMultiIndexCodesEngine.__init__
  File "C:\Users\username\Miniconda3\envs\some-env\lib\site-packages\pandas\core\indexes\multi.py", line 136, in _codes_to_ints
    codes <<= self.offsets
AttributeError: 'MultiIndex' object has no attribute 'offsets'

I thought I would work around it with:

# df.set_index("b", append=True, inplace=True)
df = df.reset_index().set_index(["a", "b"])

But the same issue persists.

edvos-sw commented 2 years ago

yeah, it seems to be some issue with debugging in spyder last version. Maybe it is spyder and not pandas

MarcoGorelli commented 2 years ago

have you reported to spyder? debugging that code with pdb works fine for me

ghost commented 2 years ago

As it only appears to happen with spyder I agree it's probably their issue. However, the error does appear to come from the pandas codebase, so perhaps it's good to have it here as well?

edvos-sw commented 2 years ago

yep, seems like a problem with pandas, spyder and python 3.10 @MarcoGorelli what version of python did you use?

MarcoGorelli commented 2 years ago

3.8

edvos-sw commented 2 years ago

can you try with python 3.10?

ghost commented 2 years ago

The Spyder issue was closed as:

... was able to reproduce it in terminal IPython, I think this is not a Spyder problem but a Pandas one.

The Spyder issue has an environment specification that reproduces this issue. Is there anything else I can provide to help resolve this issue?

FTL-Citepa commented 2 years ago

Same problem here with read_feather from pandas

MarcoGorelli commented 2 years ago

can you try with python 3.10?

Thanks - yup, can reproduce with Python3.10!

To reproduce:

  1. make a file t.py with:
    
    import pandas as pd

df = pd.DataFrame({"a": [1, 2, 3], "b": [100, 200, 300], "c": ["a", "b", "c"]})

df.set_index("a", inplace=True) import ipdb; ipdb.set_trace() df.set_index("b", append=True, inplace=True)

2. make sure you have `ipdb` installed
3. run `python t.py`, and at the breakpoint, press `n`

we get

(venv310) marcogorelli@OVMG025 tmp % python t.py

/Users/marcogorelli/tmp/t.py(7)() 6 import ipdb; ipdb.set_trace() ----> 7 df.set_index("b", append=True, inplace=True) 8

ipdb> n AttributeError: 'MultiIndex' object has no attribute 'offsets'



---

Note: this only happens with `ipdb`, not with `pdb` - so perhaps the issue is there?
FTL-Citepa commented 2 years ago

So I cannot debug on spyder if working with pandas on an environment ? this is a major problem considering I'm working on a big project and I have to control some functions independently

edvos-sw commented 2 years ago

yep, that's the problem

ghost commented 2 years ago

So I cannot debug on spyder if working with pandas on an environment ?

Well, that's a bit of a broad statement... As stated in this comment:

As a workaround please use Python <=3.9

So, if you simply specify python=3.9 in your conda environment then this issue should not occur.

mzeitlin11 commented 2 years ago

This looks related to #41935. Is this still an issue in 1.4.3?

ghost commented 2 years ago

I have recreated the environment linked to earlier, updated pandas to 1.4.3 and updated spyder-kernels to 2.3.2. Then I tested the code snippet I posted earlier. This now works as expected. I also installed ipdb and tested MarcoGorelli's example. This now also works as expected.

So it seems this issue is resolved, thanks!

Fold out for the full environment.yml file ``` name: some-env channels: - conda-forge dependencies: - backcall=0.2.0=pyh9f0ad1d_0 - backports=1.0=py_2 - backports.functools_lru_cache=1.6.4=pyhd8ed1ab_0 - black=22.3.0=pyhd8ed1ab_0 - brotli=1.0.9=h8ffe710_7 - brotli-bin=1.0.9=h8ffe710_7 - bzip2=1.0.8=h8ffe710_4 - ca-certificates=2022.6.15=h5b45459_0 - certifi=2022.6.15=py310h5588dad_0 - click=8.1.3=py310h5588dad_0 - cloudpickle=2.0.0=pyhd8ed1ab_0 - colorama=0.4.4=pyh9f0ad1d_0 - cycler=0.11.0=pyhd8ed1ab_0 - dataclasses=0.8=pyhc8e2a94_3 - debugpy=1.6.0=py310h8a704f9_0 - decorator=5.1.1=pyhd8ed1ab_0 - entrypoints=0.4=pyhd8ed1ab_0 - fonttools=4.33.3=py310he2412df_0 - freetype=2.10.4=h546665d_1 - ftputil=5.0.3=pyhd8ed1ab_0 - greenlet=1.1.2=py310h8a704f9_2 - icu=69.1=h0e60522_0 - importlib-metadata=4.11.3=py310h5588dad_1 - importlib_metadata=4.11.3=hd8ed1ab_1 - intel-openmp=2022.0.0=h57928b3_3663 - ipdb=0.13.9=pyhd8ed1ab_0 - ipykernel=6.13.0=py310hbbfc1a7_0 - ipython=7.33.0=py310h5588dad_0 - jbig=2.1=h8d14728_2003 - jedi=0.18.1=py310h5588dad_1 - jpeg=9e=h8ffe710_1 - jupyter_client=7.3.0=pyhd8ed1ab_0 - jupyter_core=4.9.2=py310h5588dad_0 - keyring=23.4.0=py310h5588dad_2 - kiwisolver=1.4.2=py310h476a331_1 - lcms2=2.12=h2a16943_0 - lerc=3.0=h0e60522_0 - libblas=3.9.0=14_win64_mkl - libbrotlicommon=1.0.9=h8ffe710_7 - libbrotlidec=1.0.9=h8ffe710_7 - libbrotlienc=1.0.9=h8ffe710_7 - libcblas=3.9.0=14_win64_mkl - libclang=13.0.1=default_h81446c8_0 - libdeflate=1.10=h8ffe710_0 - libffi=3.4.2=h8ffe710_5 - liblapack=3.9.0=14_win64_mkl - libpng=1.6.37=h1d00b33_2 - libsodium=1.0.18=h8d14728_1 - libtiff=4.3.0=hc4061b1_3 - libwebp=1.2.2=h57928b3_0 - libwebp-base=1.2.2=h8ffe710_1 - libxcb=1.13=hcd874cb_1004 - libzlib=1.2.11=h8ffe710_1014 - lz4-c=1.9.3=h8ffe710_1 - m2w64-gcc-libgfortran=5.3.0=6 - m2w64-gcc-libs=5.3.0=7 - m2w64-gcc-libs-core=5.3.0=7 - m2w64-gmp=6.1.0=2 - m2w64-libwinpthread-git=5.0.0.4634.697f757=2 - matplotlib=3.5.1=py310h5588dad_0 - matplotlib-base=3.5.1=py310h79a7439_0 - matplotlib-inline=0.1.3=pyhd8ed1ab_0 - mkl=2022.0.0=h0e2418a_796 - msys2-conda-epoch=20160418=1 - munkres=1.1.4=pyh9f0ad1d_0 - mypy_extensions=0.4.3=py310h5588dad_5 - nest-asyncio=1.5.5=pyhd8ed1ab_0 - numpy=1.22.3=py310hed7ac4c_2 - openjpeg=2.4.0=hb211442_1 - openssl=1.1.1q=h8ffe710_0 - packaging=21.3=pyhd8ed1ab_0 - pandas=1.4.3=py310hf5e1058_0 - parso=0.8.3=pyhd8ed1ab_0 - pathspec=0.9.0=pyhd8ed1ab_0 - pickleshare=0.7.5=py_1003 - pillow=9.1.0=py310h767b3fd_2 - pip=22.0.4=pyhd8ed1ab_0 - platformdirs=2.5.1=pyhd8ed1ab_0 - prompt-toolkit=3.0.29=pyha770c72_0 - psutil=5.9.0=py310he2412df_1 - pthread-stubs=0.4=hcd874cb_1001 - pygments=2.12.0=pyhd8ed1ab_0 - pymysql=1.0.2=pyhd8ed1ab_0 - pyparsing=3.0.8=pyhd8ed1ab_0 - pyqt=5.12.3=py310h5588dad_8 - pyqt-impl=5.12.3=py310h8a704f9_8 - pyqt5-sip=4.19.18=py310h8a704f9_8 - pyqtchart=5.12=py310h8a704f9_8 - pyqtwebengine=5.12.1=py310h8a704f9_8 - python=3.10.4=h9a09f29_0_cpython - python-dateutil=2.8.2=pyhd8ed1ab_0 - python_abi=3.10=2_cp310 - pytz=2022.1=pyhd8ed1ab_0 - pywin32=303=py310he2412df_0 - pywin32-ctypes=0.2.0=py310h5588dad_1005 - pyzmq=22.3.0=py310h73ada01_2 - qt=5.12.9=h556501e_6 - setuptools=62.1.0=py310h5588dad_0 - six=1.16.0=pyh6c4a22f_0 - spyder-kernels=2.3.2=py310h5588dad_0 - sqlalchemy=1.4.36=py310he2412df_0 - sqlite=3.38.3=h8ffe710_0 - tbb=2021.5.0=h2d74725_1 - tk=8.6.12=h8ffe710_0 - tomli=2.0.1=pyhd8ed1ab_0 - tornado=6.1=py310he2412df_3 - traitlets=5.1.1=pyhd8ed1ab_0 - typed-ast=1.5.3=py310he2412df_0 - typing_extensions=4.2.0=pyha770c72_1 - tzdata=2022a=h191b570_0 - ucrt=10.0.20348.0=h57928b3_0 - unicodedata2=14.0.0=py310he2412df_1 - vc=14.2=hb210afc_6 - vs2015_runtime=14.29.30037=h902a5da_6 - wcwidth=0.2.5=pyh9f0ad1d_2 - wheel=0.37.1=pyhd8ed1ab_0 - xlsxwriter=3.0.3=pyhd8ed1ab_0 - xorg-libxau=1.0.9=hcd874cb_0 - xorg-libxdmcp=1.1.3=hcd874cb_0 - xz=5.2.5=h62dcd97_1 - zeromq=4.3.4=h0e60522_1 - zipp=3.8.0=pyhd8ed1ab_0 - zlib=1.2.11=h8ffe710_1014 - zstd=1.5.2=h6255e5f_0 prefix: C:\Users\username\Miniconda3\envs\some-env ```
mzeitlin11 commented 2 years ago

Thanks for checking @ba-tno, closing then!