pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.72k stars 17.93k forks source link

BUG: Setting `index_col=False` when calling read_csv disables `on_bad_lines` #49279

Open benjsec opened 2 years ago

benjsec commented 2 years ago

Pandas version checks

Reproducible Example

>>> pd.read_csv(io.StringIO("a,b\n1,2\n1,2,3"), engine="python", on_bad_lines=lambda x: print(f"Error: {x}"))
Error: ['1', '2', '3']
   a  b
0  1  2
>>> pd.read_csv(io.StringIO("a,b\n1,2\n1,2,3"), engine="python", on_bad_lines=lambda x: print(f"Error: {x}"), index_col=False)
<stdin>:1: ParserWarning: Length of header or names does not match length of data. This leads to a loss of data with index_col=False.
   a  b
0  1  2
1  1  2

Issue Description

If index_col=False is specified when reading a csv, then the on_bad_lines parameter is ignored.

Expected Behavior

on_bad_lines parameter should be respected even when index_col=False is set

Installed Versions

>>> pd.show_versions() INSTALLED VERSIONS ------------------ commit : 91111fd99898d9dcaa6bf6bedb662db4108da6e6 python : 3.10.6.final.0 python-bits : 64 OS : Linux OS-release : 5.19.16-76051916-generic Version : #202210150742~1666053244~22.04~cf07008 SMP PREEMPT_DYNAMIC Tue O machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_GB.UTF-8 LOCALE : en_GB.UTF-8 pandas : 1.5.1 numpy : 1.23.4 pytz : 2022.5 dateutil : 2.8.2 setuptools : 56.0.0 pip : 21.0.1 Cython : None pytest : 7.1.3 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None brotli : None fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : None snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : None tzdata : None
pbhoopala commented 1 year ago

take

andrewjprice commented 1 year ago

Any updates on this issue?