pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.56k stars 17.89k forks source link

"TypeError: 'set' object does not support indexing" using na_values in read_csv() #11374

Closed goyodiaz closed 9 years ago

goyodiaz commented 9 years ago

Test case:

user@host:~$ python3
Python 3.4.3+ (default, Oct 14 2015, 16:03:50) 
[GCC 5.2.1 20151010] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from io import StringIO
>>> import pandas as pd
>>> src = """first, second
... 0,0.1
... 1,1.1
... """
>>> df = pd.read_csv(StringIO(src), na_values='XX')
>>> print(df)
   first   second
0      0      0.1
1      1      1.1
>>> df = pd.read_csv(StringIO(src), na_values='-999.99')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/goyo/.local/lib/python3.4/site-packages/pandas/io/parsers.py", line 491, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/goyo/.local/lib/python3.4/site-packages/pandas/io/parsers.py", line 278, in _read
    return parser.read()
  File "/home/goyo/.local/lib/python3.4/site-packages/pandas/io/parsers.py", line 740, in read
    ret = self._engine.read(nrows)
  File "/home/goyo/.local/lib/python3.4/site-packages/pandas/io/parsers.py", line 1187, in read
    data = self._reader.read(nrows)
  File "pandas/parser.pyx", line 766, in pandas.parser.TextReader.read (pandas/parser.c:8082)
  File "pandas/parser.pyx", line 788, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:8338)
  File "pandas/parser.pyx", line 868, in pandas.parser.TextReader._read_rows (pandas/parser.c:9465)
  File "pandas/parser.pyx", line 975, in pandas.parser.TextReader._convert_column_data (pandas/parser.c:10858)
  File "pandas/parser.pyx", line 1035, in pandas.parser.TextReader._convert_tokens (pandas/parser.c:11744)
  File "pandas/parser.pyx", line 1085, in pandas.parser.TextReader._convert_with_dtype (pandas/parser.c:12634)
  File "pandas/parser.pyx", line 1499, in pandas.parser._try_double (pandas/parser.c:19996)
  File "pandas/parser.pyx", line 1818, in pandas.parser.kset_float64_from_list (pandas/parser.c:22852)
TypeError: 'set' object does not support indexing
>>> pd.util.print_versions.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.2.0-16-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: es_ES.UTF-8

pandas: 0.17.0
nose: 1.3.6
pip: 1.5.6
setuptools: 18.4
Cython: None
numpy: 1.8.2
scipy: 0.14.1
statsmodels: 0.6.1
IPython: 4.0.0
sphinx: 1.3.1
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.6
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.4.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.3.2
html5lib: 0.999
httplib2: 0.9
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jreback commented 9 years ago

You must be picking up another version of pandas somehow. The error you are seeing IIRC is from a somewhat older version of pandas

This works just fine on linux with 3.4 (mac is below). I know this is also tested.

Python 3.4.3 |Continuum Analytics, Inc.| (default, Mar  6 2015, 12:07:41) 
Type "copyright", "credits" or "license" for more information.

IPython 4.0.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: pd.__version__ 
Out[1]: '0.17.0'

In [2]: src = 'first, second\n0,0.1\n1,1.1'

In [3]: from io import StringIO

In [4]: pd.read_csv(StringIO(src), na_values='-999.99')
Out[4]: 
   first   second
0      0      0.1
1      1      1.1
jreback commented 9 years ago

show pd.__version__.

it looks like you are directly running print_versions which is another indication you are actually using an older version (BUT ``print_versions actually will look at your environment and NOT from where it is called)

vlasisva commented 8 years ago

I see exactly the same error:

python b.py

0.17.0 Traceback (most recent call last): File "b.py", line 8, in df = pd.read_csv(StringIO(src), na_values='-999.99') File "/home/vlasisva/Software/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 491, in parser_f return _read(filepath_or_buffer, kwds) File "/home/vlasisva/Software/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 278, in _read return parser.read() File "/home/vlasisva/Software/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 740, in read ret = self._engine.read(nrows) File "/home/vlasisva/Software/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 1187, in read data = self._reader.read(nrows) File "pandas/parser.pyx", line 766, in pandas.parser.TextReader.read (pandas/parser.c:8082) File "pandas/parser.pyx", line 788, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:8338) File "pandas/parser.pyx", line 868, in pandas.parser.TextReader._read_rows (pandas/parser.c:9465) File "pandas/parser.pyx", line 975, in pandas.parser.TextReader._convert_column_data (pandas/parser.c:10858) File "pandas/parser.pyx", line 1035, in pandas.parser.TextReader._convert_tokens (pandas/parser.c:11744) File "pandas/parser.pyx", line 1085, in pandas.parser.TextReader._convert_with_dtype (pandas/parser.c:12634) File "pandas/parser.pyx", line 1499, in pandas.parser._try_double (pandas/parser.c:19996) File "pandas/parser.pyx", line 1818, in pandas.parser.kset_float64_from_list (pandas/parser.c:22852) TypeError: 'set' object does not support indexing


cat b.py from StringIO import StringIO import pandas as pd src = """first, second 0,0.1 1,1.1 """ print pd.version df = pd.read_csv(StringIO(src), na_values='-999.99')


lsb_release --all No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 14.04.3 LTS Release: 14.04

Codename: trusty

python Python 2.7.10 |Anaconda 2.1.0 (64-bit)| (default, May 28 2015, 17:02:03) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2 Type "help", "copyright", "credits" or "license" for more information. Anaconda is brought to you by Continuum Analytics. Please check out: http://continuum.io/thanks and https://binstar.org


goyodiaz commented 8 years ago

pd.__version__ is also 0.17.0 here.

More facts:

In order to clean my python environment as much as possible I uninstalled every non-distro package/version and every distro package not installed by default except dependencies of other software I use: python2.7 numpy, python2.7 gdal bindings, gnome stuff... I even uninstalled pip (packaged python3 pip is almost useless in willy anyway).

I also did my best to ensure there where nothing python-related in ~/.local/bin, ~/.local/lib, /usr/local/bin and /usr/local/lib. I also made sure there were nothing called pandas in every mounted file system. I then used get-pip.py to install pip2 and pip3 and installed python2 and python3 pandas. The issue is still present.

While this is not critical to me (it just broke one test for a function I never use in that way) I would really like to understand what's going on, but I do not know where to look at.

jreback commented 8 years ago

so the error line:

File "pandas/parser.pyx", line 1818, in pandas.parser.kset_float64_from_list (pandas/parser.c:22852)
TypeError: 'set' object does not support indexing

tells me that you are using some kind of development version of pandas (somewhere). This function DOES not exist in master or 0.17.0.

pls make sure that you are not in a development directory when trying to import pandas.

Its not clear what you actually have installed, so pls create a new virtual env or use conda.

vlasisva commented 8 years ago

I installed pandas via pip Either our environment is contaminated somehow, or what pip brings is now what you/we expect?

Will check and get back to you.

vlasisva commented 8 years ago

My "pip install pandas==0.17.0" downloads https://pypi.python.org/packages/source/p/pandas/pandas-0.17.0.tar.gz#md5=55d34c4d5655c94ca30a59dea6b36316

which contains file pandas/parser.c, which contains the following in line 1554:

static kh_float64_t ___pyx_f_6pandas_6parser_kset_float64_from_list(PyObject ); /_proto/

jreback commented 8 years ago

ok, it appears that when I distributed this it didn't rebuild the .c files (and had a newer version I was testing out). very odd.

so will fix for 0.17.1 (e.g. will make a clean version). you can simply regenerate the .c files (you need cython installed).

e.g.

make clean
python setup.py install
goyodiaz commented 8 years ago

Thanks, Jeff. That worked.

vlasisva commented 8 years ago

Other than this bug, would you consider pip-obtained pandas 0.17.0 as safe to use?

jreback commented 8 years ago

yep as I said the .c for he parser came from a or which is now merged