pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.28k stars 17.8k forks source link

BUG: can't import pandas #42506

Closed topekekere closed 3 years ago

topekekere commented 3 years ago

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd

Problem description

[I am just trying to import pandas to run my notebook and it is throwing this error


ImportErrorTraceback (most recent call last)
<ipython-input-1-f20def6a2be1> in <module>
      1 import numpy as np
----> 2 import pandas as pd

~/anaconda3/lib/python3.8/site-packages/pandas/__init__.py in <module>
    142 from pandas.util._print_versions import show_versions
    143 
--> 144 from pandas.io.api import (
    145     # excel
    146     ExcelFile,

~/anaconda3/lib/python3.8/site-packages/pandas/io/api.py in <module>
      6 
      7 from pandas.io.clipboards import read_clipboard
----> 8 from pandas.io.excel import ExcelFile, ExcelWriter, read_excel
      9 from pandas.io.feather_format import read_feather
     10 from pandas.io.gbq import read_gbq

~/anaconda3/lib/python3.8/site-packages/pandas/io/excel/__init__.py in <module>
----> 1 from pandas.io.excel._base import ExcelFile, ExcelWriter, read_excel
      2 from pandas.io.excel._odswriter import ODSWriter as _ODSWriter
      3 from pandas.io.excel._openpyxl import OpenpyxlWriter as _OpenpyxlWriter
      4 from pandas.io.excel._util import register_writer
      5 from pandas.io.excel._xlsxwriter import XlsxWriter as _XlsxWriter

~/anaconda3/lib/python3.8/site-packages/pandas/io/excel/_base.py in <module>
     31     pop_header_name,
     32 )
---> 33 from pandas.io.parsers import TextParser
     34 
     35 _read_excel_doc = (

~/anaconda3/lib/python3.8/site-packages/pandas/io/parsers/__init__.py in <module>
----> 1 from pandas.io.parsers.readers import (
      2     TextFileReader,
      3     TextParser,
      4     read_csv,
      5     read_fwf,

~/anaconda3/lib/python3.8/site-packages/pandas/io/parsers/readers.py in <module>
     15 import pandas._libs.lib as lib
     16 from pandas._libs.parsers import STR_NA_VALUES
---> 17 from pandas._typing import (
     18     ArrayLike,
     19     DtypeArg,

ImportError: cannot import name 'DtypeArg' from 'pandas._typing' (/home/tope/anaconda3/lib/python3.8/site-packages/pandas/_typing.py)```]

#### Expected Output

#### Output of ``pd.show_versions()``

<details>

[paste the output of ``--------------------------------------
NameErrorTraceback (most recent call last)
<ipython-input-4-3d232a07e144> in <module>
----> 1 pd.show_versions()

NameError: name 'pd' is not defined`` here leaving a blank line after the details tag]

</details>
sangilki commented 3 years ago

Try below options

option1

pip uninstall pandas pip install pandas --upgrade

option2

pip install pandas==1.1.5

rhshadrach commented 3 years ago

@topekekere - does attempting to reinstall pandas as @sangilki mentioned above fix the issue for you?

topekekere commented 3 years ago

no, still have the errors

topekekere commented 3 years ago

am actually having the errors on a remote machine

mirekphd commented 3 years ago

This import error in the line from pandas.io.excel._base import ExcelFile, ExcelWriter, read_excel in /pandas/io/api.py is a real issue, affecting not only Linux, but also Windows users (as reported here on Stack Overflow).

The issue apparently emerged in pandas==1.3.1, so a workaround (effective at least for me in the situation reproduced below) was to downgrade to pandas==1.3.0.

So here's a fully reproducible example using one of our python containers, with which I've just reproduced this bug on two different Unix machines: our Centos 8 build server and on my Ubuntu 18.04 workstation:

$ docker run mirekphd/ml-gpu-py38-cuda112-cust:20210806 python -c "import pandas as pd"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/conda/lib/python3.8/site-packages/pandas/__init__.py", line 144, in <module>
    from pandas.io.api import (
  File "/opt/conda/lib/python3.8/site-packages/pandas/io/api.py", line 8, in <module>
    from pandas.io.excel import ExcelFile, ExcelWriter, read_excel
  File "/opt/conda/lib/python3.8/site-packages/pandas/io/excel/__init__.py", line 1, in <module>
    from pandas.io.excel._base import ExcelFile, ExcelWriter, read_excel
  File "/opt/conda/lib/python3.8/site-packages/pandas/io/excel/_base.py", line 33, in <module>
    from pandas.io.parsers import TextParser
  File "/opt/conda/lib/python3.8/site-packages/pandas/io/parsers/__init__.py", line 1, in <module>
    from pandas.io.parsers.readers import (
  File "/opt/conda/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 17, in <module>
    from pandas._typing import (
ImportError: cannot import name 'DtypeArg' from 'pandas._typing' (/opt/conda/lib/python3.8/site-packages/pandas/_typing.py)

More info Surprisingly, this is the only one affected out of 4 python containers which we actively maintain, maybe because it is the only one that contains cuPy and thus has a lot of packages installed with conda, which we avoid elsewhere for performance reasons.

Please be patient, as it has a few gigs, so docker takes a few minutes for pulling and extracting the image layers, at least on magnetic hard drives on my workstation (I don't have a smaller container with this issue).

MarcoGorelli commented 3 years ago

Hi @mirekphd - is there a way to reproduce this without downloading GB-worth of Docker layers? Can you try making a new virtual environment with nothing else in it and installing pandas in there?

mirekphd commented 3 years ago

Hi @mirekphd - is there a way to reproduce this without downloading GB-worth of Docker layers? Can you try making a new virtual environment with nothing else in it and installing pandas in there?

Hi, I'd say it's very unlikely, given that these other containers that differ mainly with respect to cuPy do not have this issue even when they contain the affected pandas==1.3.1:

$ docker run mirekphd/ml-gpu-py38-cuda101-cust:latest python -c "import pandas as pd; print(pd.__version__)"
1.3.1

$ docker run mirekphd/ml-cpu-py38-jup-cust:latest python -c "import pandas as pd; print(pd.__version__)"
1.1.5

$ docker run mirekphd/ml-cpu-py37-jup-cust:latest python -c "import pandas as pd; print(pd.__version__)"
1.3.1
MarcoGorelli commented 3 years ago

Just to make sure I've understood - you have different containers, and in one of them, you can't import pandas, but in the others, you can?

Might the issue not be with something else which you have installed in that container?

mirekphd commented 3 years ago

Just to make sure I've understood - you have different containers, and in one of them, you can't import pandas, but in the others, you can?

Yes, I've clarified this above for these unaffected containers.

Might the issue not be with something else which you have installed in that container?

It is probably an interaction with other packages, most likely cuPy or its wide array of dependencies. If you are arguing that Pandas should not be held responsible for such interactions with other packages that apparently can break it, then I admit I'm a bit surprised. What if the other party is non-cooperating or fails to maintain their package for a few years (as it happens on Github)?

MarcoGorelli commented 3 years ago

pandas does test downstream dependencies:

https://github.com/pandas-dev/pandas/blob/eaee348792cf45b0f8ca52944f066b7f27503066/pandas/tests/test_downstream.py

Could you please make a new virtual environment, install pandas, verify that it imports fine, and then, one-by-one, install the other extra dependencies you have until you find which one breaks pandas?

mirekphd commented 3 years ago

pandas does test downstream dependencies:

Great, I'm glad to hear that! Could you possibly add at least cudf (and ideally also deeptables) to your list of tested reverse dependencies? They can be installed like this:

$pip install cython cupy-cuda112 deeptables 
$conda install -c rapidsai -c nvidia -c conda-forge blazingsql cudf python=$PYTHON_VERSION cudatoolkit

Once you add it to your unit tests (I hope you can do it, despite its enormous size and hardware requirements), you won't release any new pandas version that is incompatible with either cudf or deeptables, like pandas==1.3.0 or pandas==1.3.1 (1.1.5 is the latest version of pandas compatible with either of those dependencies):

# listing reverse dependencies of pandas that pin it to a previous version - when both deeptables and cudf are installed:
$ docker run mirekphd/ml-gpu-py38-cuda112-cust:20210807 pipdeptree -r -p pandas | grep "<"
------------------------------------------------------------------------
  - cudf==21.6.1+2.g101fc0fda4 [requires: pandas>=1.0,<1.3.0dev0]
  - dask-cudf==21.6.1+2.g101fc0fda4 [requires: pandas>=1.0,<1.3.0dev0]
  - deeptables==0.1.14 [requires: pandas>=0.25.3,<=1.1.5]

or

# listing reverse dependencies of pandas that pin it to a previous version - when only cudf is installed:
$ docker run mirekphd/ml-gpu-py38-cuda112-cust:20210807 pipdeptree -r -p pandas | grep "<"
------------------------------------------------------------------------
  - cudf==21.6.1+2.g101fc0fda4 [requires: pandas>=1.0,<1.3.0dev0]
  - dask-cudf==21.6.1+2.g101fc0fda4 [requires: pandas>=1.0,<1.3.0dev0]

As for the suggested venv (instead of Docker) and one-by-one stepwise installation (instead of pipdeptree): thanks, but no, thanks:)

MarcoGorelli commented 3 years ago

Once you add it to your unit tests (I hope you can do it, despite its enormous size and hardware requirements), you won't release any new pandas version that is incompatible with either cudf or deeptables, like pandas==1.3.0 or pandas==1.3.1 (1.1.5 is the latest version of pandas compatible with either of those dependencies):

Are cudf and deeptables the packages which cause issues then? Do you not have them in the other containers where pandas imports fine?

Once you add it to your unit tests (I hope you can do it, despite its enormous size and hardware requirements)

Yeah they're too large to be included here really. I'd suggest reporting the issue to them

mirekphd commented 3 years ago

Uninstalling deeptables did not help, so it leaves cudf as culprit (more precisely: its latest binary version available in Anaconda: cudf==21.06.01) as the dependency that clashes with pandas==1.3.1 causing its import failure.

As per your suggestion, I'm moving the issue over to rapidsai/cudf. For now, a workaround is to downgrade to pandas==1.2.5:

$ docker run mirekphd/ml-gpu-py38-cuda112-cust:20210807 python -c "import pandas as pd; print(pd.__version__)"
1.2.5
MarcoGorelli commented 3 years ago

cool, thanks - closing for now then

MarcoGorelli commented 3 years ago

Actually, reopening for now as a few people have independently run into this

jorisvandenbossche commented 3 years ago

@mirekphd how do you install cudf / pandas in the docker? If you install cudf with conda, it should normally prevent you from getting pandas 1.3.1.

I did a quick test locally creating an environment with cudf, and then installing latest pandas with pip, but that didn't reproduce the import issue.

mirekphd commented 3 years ago

We moved away from conda for as many packages as possibile, as pip is much more performant (I even got some upvotes on stack overflow for this tip). So pandas is installed with pip, but we have to install cudf with conda, which tries to install its own older version of pandas, as per cudf requirements. I suppose this may break things...

On Sat, 7 Aug 2021, 19:39 Joris Van den Bossche, @.***> wrote:

@mirekphd https://github.com/mirekphd how do you install cudf / pandas in the docker? If you install cudf with conda, it should normally prevent you from getting pandas 1.3.1.

MarcoGorelli commented 3 years ago

So pandas is installed with pip, but we have to install cudf with conda, which tries to install its own older version of pandas, as per cudf requirements. I suppose this may break things...

Yeah, mixing pip and conda can potentially break things. I should've asked how you installed pandas straight away

IMO this could be closed now

mirekphd commented 3 years ago

I'm not the only one reporting it though... we should probably ask others reporting this issue here and on Stack Overflow if they mixed installers too.

On Tue, 10 Aug 2021, 09:48 Marco Edward Gorelli, @.***> wrote:

So pandas is installed with pip, but we have to install cudf with conda, which tries to install its own older version of pandas, as per cudf requirements. I suppose this may break things...

Yeah, mixing pip and conda can potentially break things. I should've asked how you installed pandas straight away

IMO this could be closed now

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/42506#issuecomment-895808812, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIYBQEBONRU6EOK5O62Z2HDT4DKUFANCNFSM5AHAMWLA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

rhshadrach commented 3 years ago

Would be great to follow up with others. If there are similar issues, we can reopen or open a new ones if they are of a different nature. But currently it seems to me the cause of this issue is understood and there is no action that we should take within pandas. I think that means it is okay to close.

MarcoGorelli commented 3 years ago

Thanks, agreed - if anyone else experiences this they can open a new issue

jorisvandenbossche commented 3 years ago

@mirekphd I started downloaded the docker image this morning to reproduce your case (docker run mirekphd/ml-gpu-py38-cuda112-cust:20210806 python -c "import pandas as pd"), and now looking into it. I think the issue stems from a mixture of conda and pip that went wrong (which in principle still shouldn't happen, but not pandas' fault).

Looking into your conda environment site-packages directory, I see:

jovyan@73e963215558:/opt/conda/lib/python3.8/site-packages$ ls -l
...
drwxr-sr-x  1 jovyan users     4096 Aug  6 18:47 pandas
drwxr-sr-x  2 root   users     4096 Aug  6 18:47 pandas-1.2.5-py3.8.egg-info
drwxr-sr-x  2 jovyan users     4096 Aug  6 18:37 pandas-1.3.1.dist-info

So conda originally installed pandas 1.2.5, and later pip installed pandas 1.3.1. However, something went wrong here, because if you look at the pandas files:

jovyan@73e963215558:/opt/conda/lib/python3.8/site-packages$ cat pandas/_version.py 

# This file was generated by 'versioneer.py' (0.19) from
# revision-control system data, or from the parent directory name of an
# unpacked source archive. Distribution tarballs contain a pre-generated copy
# of this file.

import json

version_json = '''
{
 "date": "2021-06-22T10:53:30+0100",
 "dirty": false,
 "error": null,
 "full-revisionid": "7c48ff4409c622c582c56a5702373f726de08e96",
 "version": "1.2.5"
}
'''  # END VERSION_JSON

def get_versions():
    return json.loads(version_json)

This still indicates 1.2.5, and not 1.3.1. Also the file in pandas/_typing.py (the one that generates the error), has the content of pandas 1.2.5 (which didn't yet contain DtypeArg

But, the file from which the error comes from is pandas/io/parsers/readers.py (that's the file that does the from pandas._typing import DtypeArg), and that file didn't yet exist in 1.2.5.

So it "seems" that the pip install added some files, but didn't overwrite (all) existing files. Not sure how this could happen, but in any case this mixture of files from 1.2.5 and 1.3.1 is the cause of the import error.

jorisvandenbossche commented 3 years ago

Usage of --ignore-installed could cause this (https://github.com/pypa/pip/issues/5020), don't directly find the actual docker files for the image to see if that's the case.

mirekphd commented 3 years ago

"it "seems" that the pip install added some files, but didn't overwrite (all) existing files."

The sequence was the other way round: first newer pandas with pip, then older with conda (pinned by cudf). Maybe that sequence might explain why the older version could not overwrite all files (unknown at its release time)? I never ignore installed versions, in fact with pip I use '--force-reinstall' flag. I also contacted many Devs over the years to release pins in their requirements (Nvidia has just promised to do it to for cudf).

On Thu, 12 Aug 2021, 18:21 Joris Van den Bossche, @.***> wrote:

Usage of --ignore-installed could cause this (pypa/pip#5020 https://github.com/pypa/pip/issues/5020), don't directly find the actual docker files for the image to see if that's the case.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/42506#issuecomment-897775048, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIYBQEGVVKTHBAFCANUVEQTT4PYHJANCNFSM5AHAMWLA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

jorisvandenbossche commented 3 years ago

The sequence was the other way round: first newer pandas with pip, then older with conda

Ah, yes, I was confused by the timestamp of the different (old and new) files in the pandas directory in site-packages, but I suppose those don't necessarily reflect installation time.

It seems this is a known issue with conda/pip compatibility (and the specific example here is easily reproduced locally: create new conda env, install pandas with pip, install pandas=1.2.5 with conda, and you get the error). See for example https://www.anaconda.com/blog/using-pip-in-a-conda-environment where one of the recommendations is "don't use conda after pip" (first install everything you can with conda, and then the rest with pip). The back and forth conda / pip / conda / .. usage is likely to give problems.

You can blame cudf for pinning pandas, but I suppose it should also be quite straightforward to not install pandas with pip before you install cudf with conda. If performance of conda is a concern, I recommend to check out mamba (a fast almost drop-in replacement, https://github.com/mamba-org/mamba)

ValBerthe commented 3 years ago

Usage of --ignore-installed could cause this (pypa/pip#5020), don't directly find the actual docker files for the image to see if that's the case.

I ran into that issue with pip and --ignore-installed flag. I did not mix pip and conda. Re-installing the package with pip install pandas==1.3.2 did solve the issue.

jaanli commented 3 years ago

Also had this issue from mixing conda and pip :(