pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.54k stars 17.89k forks source link

Uninformative error when parsing numbers with a non-standard n/a value #41117

Open ghost opened 3 years ago

ghost commented 3 years ago

Code Sample

data.csv

key,value a,1.234 b,"1,234.00" c,":"

Reading the data works ok without parsing the numbers

>>> pd.read_csv('data.csv')
  key     value
0   a     1.234
1   b  1,234.00
2   c         :

When parsing numbers it fails even when the correct thousand separator is set:

>>> pd.read_csv('data.csv', dtype={'value': float}, thousands=',')
ValueError: could not convert string to float: '1,234.00'

However, with the right N/A value setting it works

pd.read_csv('data.csv', dtype={'value': float}, thousands=',', na_values=':')
  key     value
0   a     1.234
1   b  1234.000
2   c       NaN

Problem description

I think the error message saying that pandas could not convert value "1,234.00" to a number is uninformative as the error is rather in the missing value ":".

Might be related to #2570.

Expected Output

I’d expect the error message should be about the non-parsable value ":".

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : 2cb96529396d93b46abab7bbc73a208e708c642e python : 3.7.10.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.18362 machine : AMD64 processor : Intel64 Family 6 Model 142 Stepping 10, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : None.None pandas : 1.2.4 numpy : 1.20.2 pytz : 2021.1 dateutil : 2.8.1 pip : 21.0.1 setuptools : 52.0.0.post20210125 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : 7.22.0 pandas_datareader: None bs4 : None bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : 3.4.1 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : 1.6.2 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : None
KenilMehta commented 3 years ago

I would like to help solving this issue. Is this open for contributions?

lilisako commented 3 years ago

It's been more than 2 weeks since the last comment posted here. Can I take this issue?

KenilMehta commented 3 years ago

I am not working on it.