Closed topekekere closed 3 years ago
Try below options
option1
pip uninstall pandas pip install pandas --upgrade
option2
pip install pandas==1.1.5
@topekekere - does attempting to reinstall pandas as @sangilki mentioned above fix the issue for you?
no, still have the errors
am actually having the errors on a remote machine
This import error in the line from pandas.io.excel._base import ExcelFile, ExcelWriter, read_excel
in /pandas/io/api.py
is a real issue, affecting not only Linux, but also Windows users (as reported here on Stack Overflow).
The issue apparently emerged in pandas==1.3.1
, so a workaround (effective at least for me in the situation reproduced below) was to downgrade to pandas==1.3.0
.
So here's a fully reproducible example using one of our python containers, with which I've just reproduced this bug on two different Unix machines: our Centos 8 build server and on my Ubuntu 18.04 workstation:
$ docker run mirekphd/ml-gpu-py38-cuda112-cust:20210806 python -c "import pandas as pd"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/opt/conda/lib/python3.8/site-packages/pandas/__init__.py", line 144, in <module>
from pandas.io.api import (
File "/opt/conda/lib/python3.8/site-packages/pandas/io/api.py", line 8, in <module>
from pandas.io.excel import ExcelFile, ExcelWriter, read_excel
File "/opt/conda/lib/python3.8/site-packages/pandas/io/excel/__init__.py", line 1, in <module>
from pandas.io.excel._base import ExcelFile, ExcelWriter, read_excel
File "/opt/conda/lib/python3.8/site-packages/pandas/io/excel/_base.py", line 33, in <module>
from pandas.io.parsers import TextParser
File "/opt/conda/lib/python3.8/site-packages/pandas/io/parsers/__init__.py", line 1, in <module>
from pandas.io.parsers.readers import (
File "/opt/conda/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 17, in <module>
from pandas._typing import (
ImportError: cannot import name 'DtypeArg' from 'pandas._typing' (/opt/conda/lib/python3.8/site-packages/pandas/_typing.py)
More info
Surprisingly, this is the only one affected out of 4 python containers which we actively maintain, maybe because it is the only one that contains cuPy
and thus has a lot of packages installed with conda
, which we avoid elsewhere for performance reasons.
Please be patient, as it has a few gigs, so docker takes a few minutes for pulling and extracting the image layers, at least on magnetic hard drives on my workstation (I don't have a smaller container with this issue).
Hi @mirekphd - is there a way to reproduce this without downloading GB-worth of Docker layers? Can you try making a new virtual environment with nothing else in it and installing pandas in there?
Hi @mirekphd - is there a way to reproduce this without downloading GB-worth of Docker layers? Can you try making a new virtual environment with nothing else in it and installing pandas in there?
Hi, I'd say it's very unlikely, given that these other containers that differ mainly with respect to cuPy
do not have this issue even when they contain the affected pandas==1.3.1
:
$ docker run mirekphd/ml-gpu-py38-cuda101-cust:latest python -c "import pandas as pd; print(pd.__version__)"
1.3.1
$ docker run mirekphd/ml-cpu-py38-jup-cust:latest python -c "import pandas as pd; print(pd.__version__)"
1.1.5
$ docker run mirekphd/ml-cpu-py37-jup-cust:latest python -c "import pandas as pd; print(pd.__version__)"
1.3.1
Just to make sure I've understood - you have different containers, and in one of them, you can't import pandas, but in the others, you can?
Might the issue not be with something else which you have installed in that container?
Just to make sure I've understood - you have different containers, and in one of them, you can't import pandas, but in the others, you can?
Yes, I've clarified this above for these unaffected containers.
Might the issue not be with something else which you have installed in that container?
It is probably an interaction with other packages, most likely cuPy
or its wide array of dependencies. If you are arguing that Pandas should not be held responsible for such interactions with other packages that apparently can break it, then I admit I'm a bit surprised. What if the other party is non-cooperating or fails to maintain their package for a few years (as it happens on Github)?
pandas does test downstream dependencies:
Could you please make a new virtual environment, install pandas, verify that it imports fine, and then, one-by-one, install the other extra dependencies you have until you find which one breaks pandas?
pandas does test downstream dependencies:
Great, I'm glad to hear that! Could you possibly add at least
cudf
(and ideally alsodeeptables
) to your list of tested reverse dependencies? They can be installed like this:$pip install cython cupy-cuda112 deeptables $conda install -c rapidsai -c nvidia -c conda-forge blazingsql cudf python=$PYTHON_VERSION cudatoolkit
Once you add it to your unit tests (I hope you can do it, despite its enormous size and hardware requirements), you won't release any new pandas
version that is incompatible with either cudf
or deeptables
, like pandas==1.3.0
or pandas==1.3.1
(1.1.5
is the latest version of pandas
compatible with either of those dependencies):
# listing reverse dependencies of pandas that pin it to a previous version - when both deeptables and cudf are installed:
$ docker run mirekphd/ml-gpu-py38-cuda112-cust:20210807 pipdeptree -r -p pandas | grep "<"
------------------------------------------------------------------------
- cudf==21.6.1+2.g101fc0fda4 [requires: pandas>=1.0,<1.3.0dev0]
- dask-cudf==21.6.1+2.g101fc0fda4 [requires: pandas>=1.0,<1.3.0dev0]
- deeptables==0.1.14 [requires: pandas>=0.25.3,<=1.1.5]
or
# listing reverse dependencies of pandas that pin it to a previous version - when only cudf is installed:
$ docker run mirekphd/ml-gpu-py38-cuda112-cust:20210807 pipdeptree -r -p pandas | grep "<"
------------------------------------------------------------------------
- cudf==21.6.1+2.g101fc0fda4 [requires: pandas>=1.0,<1.3.0dev0]
- dask-cudf==21.6.1+2.g101fc0fda4 [requires: pandas>=1.0,<1.3.0dev0]
As for the suggested venv
(instead of Docker) and one-by-one stepwise installation (instead of pipdeptree
): thanks, but no, thanks:)
Once you add it to your unit tests (I hope you can do it, despite its enormous size and hardware requirements), you won't release any new pandas version that is incompatible with either cudf or deeptables, like pandas==1.3.0 or pandas==1.3.1 (1.1.5 is the latest version of pandas compatible with either of those dependencies):
Are cudf and deeptables the packages which cause issues then? Do you not have them in the other containers where pandas imports fine?
Once you add it to your unit tests (I hope you can do it, despite its enormous size and hardware requirements)
Yeah they're too large to be included here really. I'd suggest reporting the issue to them
Uninstalling deeptables
did not help, so it leaves cudf
as culprit (more precisely: its latest binary version available in Anaconda: cudf==21.06.01
) as the dependency that clashes with pandas==1.3.1
causing its import failure.
As per your suggestion, I'm moving the issue over to rapidsai/cudf. For now, a workaround is to downgrade to pandas==1.2.5
:
$ docker run mirekphd/ml-gpu-py38-cuda112-cust:20210807 python -c "import pandas as pd; print(pd.__version__)"
1.2.5
cool, thanks - closing for now then
Actually, reopening for now as a few people have independently run into this
@mirekphd how do you install cudf / pandas in the docker? If you install cudf with conda, it should normally prevent you from getting pandas 1.3.1.
I did a quick test locally creating an environment with cudf, and then installing latest pandas with pip, but that didn't reproduce the import issue.
We moved away from conda
for as many packages as possibile, as pip
is
much more performant (I even got some upvotes on stack overflow for this
tip). So pandas
is installed with pip
, but we have to install cudf
with conda
, which tries to install its own older version of pandas
, as
per cudf
requirements. I suppose this may break things...
On Sat, 7 Aug 2021, 19:39 Joris Van den Bossche, @.***> wrote:
@mirekphd https://github.com/mirekphd how do you install cudf / pandas in the docker? If you install cudf with conda, it should normally prevent you from getting pandas 1.3.1.
So
pandas
is installed withpip
, but we have to installcudf
withconda
, which tries to install its own older version ofpandas
, as percudf
requirements. I suppose this may break things...
Yeah, mixing pip and conda can potentially break things. I should've asked how you installed pandas straight away
IMO this could be closed now
I'm not the only one reporting it though... we should probably ask others reporting this issue here and on Stack Overflow if they mixed installers too.
On Tue, 10 Aug 2021, 09:48 Marco Edward Gorelli, @.***> wrote:
So pandas is installed with pip, but we have to install cudf with conda, which tries to install its own older version of pandas, as per cudf requirements. I suppose this may break things...
Yeah, mixing pip and conda can potentially break things. I should've asked how you installed pandas straight away
IMO this could be closed now
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/42506#issuecomment-895808812, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIYBQEBONRU6EOK5O62Z2HDT4DKUFANCNFSM5AHAMWLA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .
Would be great to follow up with others. If there are similar issues, we can reopen or open a new ones if they are of a different nature. But currently it seems to me the cause of this issue is understood and there is no action that we should take within pandas. I think that means it is okay to close.
Thanks, agreed - if anyone else experiences this they can open a new issue
@mirekphd I started downloaded the docker image this morning to reproduce your case (docker run mirekphd/ml-gpu-py38-cuda112-cust:20210806 python -c "import pandas as pd"
), and now looking into it. I think the issue stems from a mixture of conda and pip that went wrong (which in principle still shouldn't happen, but not pandas' fault).
Looking into your conda environment site-packages directory, I see:
jovyan@73e963215558:/opt/conda/lib/python3.8/site-packages$ ls -l
...
drwxr-sr-x 1 jovyan users 4096 Aug 6 18:47 pandas
drwxr-sr-x 2 root users 4096 Aug 6 18:47 pandas-1.2.5-py3.8.egg-info
drwxr-sr-x 2 jovyan users 4096 Aug 6 18:37 pandas-1.3.1.dist-info
So conda originally installed pandas 1.2.5, and later pip installed pandas 1.3.1. However, something went wrong here, because if you look at the pandas files:
jovyan@73e963215558:/opt/conda/lib/python3.8/site-packages$ cat pandas/_version.py
# This file was generated by 'versioneer.py' (0.19) from
# revision-control system data, or from the parent directory name of an
# unpacked source archive. Distribution tarballs contain a pre-generated copy
# of this file.
import json
version_json = '''
{
"date": "2021-06-22T10:53:30+0100",
"dirty": false,
"error": null,
"full-revisionid": "7c48ff4409c622c582c56a5702373f726de08e96",
"version": "1.2.5"
}
''' # END VERSION_JSON
def get_versions():
return json.loads(version_json)
This still indicates 1.2.5, and not 1.3.1. Also the file in pandas/_typing.py
(the one that generates the error), has the content of pandas 1.2.5 (which didn't yet contain DtypeArg
But, the file from which the error comes from is pandas/io/parsers/readers.py
(that's the file that does the from pandas._typing import DtypeArg
), and that file didn't yet exist in 1.2.5.
So it "seems" that the pip install added some files, but didn't overwrite (all) existing files. Not sure how this could happen, but in any case this mixture of files from 1.2.5 and 1.3.1 is the cause of the import error.
Usage of --ignore-installed
could cause this (https://github.com/pypa/pip/issues/5020), don't directly find the actual docker files for the image to see if that's the case.
"it "seems" that the pip install added some files, but didn't overwrite (all) existing files."
The sequence was the other way round: first newer pandas with pip, then older with conda (pinned by cudf). Maybe that sequence might explain why the older version could not overwrite all files (unknown at its release time)? I never ignore installed versions, in fact with pip I use '--force-reinstall' flag. I also contacted many Devs over the years to release pins in their requirements (Nvidia has just promised to do it to for cudf).
On Thu, 12 Aug 2021, 18:21 Joris Van den Bossche, @.***> wrote:
Usage of --ignore-installed could cause this (pypa/pip#5020 https://github.com/pypa/pip/issues/5020), don't directly find the actual docker files for the image to see if that's the case.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/42506#issuecomment-897775048, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIYBQEGVVKTHBAFCANUVEQTT4PYHJANCNFSM5AHAMWLA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .
The sequence was the other way round: first newer pandas with pip, then older with conda
Ah, yes, I was confused by the timestamp of the different (old and new) files in the pandas directory in site-packages, but I suppose those don't necessarily reflect installation time.
It seems this is a known issue with conda/pip compatibility (and the specific example here is easily reproduced locally: create new conda env, install pandas with pip, install pandas=1.2.5 with conda, and you get the error). See for example https://www.anaconda.com/blog/using-pip-in-a-conda-environment where one of the recommendations is "don't use conda after pip" (first install everything you can with conda, and then the rest with pip). The back and forth conda / pip / conda / .. usage is likely to give problems.
You can blame cudf for pinning pandas, but I suppose it should also be quite straightforward to not install pandas with pip before you install cudf with conda. If performance of conda is a concern, I recommend to check out mamba (a fast almost drop-in replacement, https://github.com/mamba-org/mamba)
Usage of
--ignore-installed
could cause this (pypa/pip#5020), don't directly find the actual docker files for the image to see if that's the case.
I ran into that issue with pip and --ignore-installed
flag. I did not mix pip and conda.
Re-installing the package with pip install pandas==1.3.2
did solve the issue.
Also had this issue from mixing conda and pip :(
[x] I have checked that this issue has not already been reported.
[x] I have confirmed this bug exists on the latest version of pandas.
[x] (optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Problem description
[I am just trying to import pandas to run my notebook and it is throwing this error