jupyter-incubator / sparkmagic

Jupyter magics and kernels for working with remote Spark clusters
Other
1.33k stars 447 forks source link

[BUG] Pandas 1.5.0 has breaking changes. #776

Closed GaspardBT closed 1 year ago

GaspardBT commented 2 years ago

Describe the bug Import Sparkmagic fail with `ImportError: cannot import name 'DataError' from 'pandas.core.groupby'

To Reproduce Run the Contributing steps with python3.9.7 Then run the tests: nosetests hdijupyterutils autovizwidget sparkmagic The tests fails with: ImportError: cannot import name 'DataError' from 'pandas.core.groupby'

Expected behavior The tests should succeed

Additional context Pandas released a new version with what can be seen as breaking changes as they change the path for errors used in sparkmagics here This can be solved by either changing the requirements.txt to limit pandas version under 1.5.0 or to change the import path to the new ones. I created a PR that implemented the second solutions.

lucasdurand commented 2 years ago

This change would require the user to be on pandas>=1.5.0 , but that isn’t specified in the package requirements, so this change should also:

  1. Pin pandas>=1.5.0 in the setup.py (which I’m thinking we don’t want to do)
  2. Instead, extend the existing try/except block to handle the new DataError location, while still supporting the existing fallbacks
sergiimk commented 1 year ago

Also encountered this.

The cause of this isn't really a "breaking change in Pandas" but rather Sparkmagic reaching into Pandas' internal paths to import DataError type.

If you look at the Panda's API reference - the correct/public import path is:

from pandas.errors import DataError

In our docker image we added a following temporary workaround which solved the issue:

sed -i 's/from pandas.core.groupby import DataError/from pandas.errors import DataError/g' /opt/conda/lib/python3.10/site-packages/autovizwidget/plotlygraphs/graphbase.py
sed -i 's/from pandas.core.groupby import DataError/from pandas.errors import DataError/g' /opt/conda/lib/python3.10/site-packages/autovizwidget/plotlygraphs/piegraph.py
iirekm-test commented 1 year ago

+1, either running locally, or docker build and then docker run, I get the error. Sparkmagic currently (2022) is useless until some fix is done.

devstein commented 1 year ago

This has been fixed in the latest release.

Thank you all for your patience and thank you @GaspardBT for the patch!