pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.74k stars 17.95k forks source link

bug in csv_reader? #2296

Closed cossatot closed 11 years ago

cossatot commented 11 years ago

Some previously-functioning code of mine broke this morning after upgrading to the newest Ubuntu package of Pandas v. 0.9.2. It seems to be a bug in the upgraded read_csv parser. Beyond that, the error message (below) is fairly unhelpful to the uninitiated. Maybe this is a bug, or maybe the new version uses different syntax (although some of the .csv files are able to be imported...).

I am sending a copy of the offending file to Wes, as I've got no quick place to put it.

Thanks, Richard

aHe_df = pd.readcsv('aHe_aliquots.csv')


CParserError Traceback (most recent call last) /home/itchy/ecopetrol/ec-working/data/ in () ----> 1 aHe_df =pd.read_csv('aHe_aliquots.csv')

/usr/lib/pymodules/python2.7/pandas/io/parsers.pyc in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, header, index_col, names, skiprows, skipfooter, skip_footer, na_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, nrows, iterator, chunksize, verbose, encoding, squeeze) 361 buffer_lines=buffer_lines) 362 --> 363 return _read(filepath_or_buffer, kwds) 364 365 parser_f.name = name

/usr/lib/pymodules/python2.7/pandas/io/parsers.pyc in _read(filepath_or_buffer, kwds) 185 186 # Create the parser.

--> 187 parser = TextFileReader(filepath_or_buffer, **kwds) 188 189 if nrows is not None:

/usr/lib/pymodules/python2.7/pandas/io/parsers.pyc in init(self, f, engine, **kwds) 465 self.options, self.engine = self._clean_options(options, engine) 466 --> 467 self._make_engine(self.engine) 468 469 def _get_options_with_defaults(self, engine):

/usr/lib/pymodules/python2.7/pandas/io/parsers.pyc in _make_engine(self, engine) 567 def _make_engine(self, engine='c'): 568 if engine == 'c': --> 569 self._engine = CParserWrapper(self.f, **self.options) 570 else: 571 if engine == 'python':

/usr/lib/pymodules/python2.7/pandas/io/parsers.pyc in init(self, src, _kwds) 787 ParserBase.init(self, kwds) 788 --> 789 self._reader = _parser.TextReader(src, _kwds) 790 791 # XXX

/usr/lib/pymodules/python2.7/pandas/_parser.so in pandas._parser.TextReader.cinit (pandas/src/parser.c:3357)()

/usr/lib/pymodules/python2.7/pandas/_parser.so in pandas._parser.TextReader._get_header (pandas/src/parser.c:4283)()

/usr/lib/pymodules/python2.7/pandas/_parser.so in pandas._parser.TextReader._tokenize_rows (pandas/src/parser.c:5731)()

/usr/lib/pymodules/python2.7/pandas/_parser.so in pandas._parser.raise_parser_error (pandas/src/parser.c:13774)()

CParserError: Error tokenizing data. C error: no error message set

wesm commented 11 years ago

The file has \r line breaks. Universal newline mode handled this in the past, but this case must be addressed directly in the new C tokenizer

wesm commented 11 years ago

Should be working now