pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.25k stars 17.79k forks source link

BUG: read_fwf fails with comments='#' #53922

Open naught101 opened 1 year ago

naught101 commented 1 year ago

Pandas version checks

Reproducible Example

With this file:

test.txt:

# file :: atlas/series/CMIP5one/rcp85/monthly/world/time_tas_Amon_onemean_rcp85_000_world.dat
   93.5000       17.9877     2093   1 2093   1
   94.5000       18.0374     2094   1 2094   1
   95.5000       18.0906     2095   1 2095   1
   96.5000       18.1404     2096   1 2096   1
   97.5000       18.2044     2097   1 2097   1
   98.5000       18.2412     2098   1 2098   1
   99.5000       18.2734     2099   1 2099   1
   100.500       18.3431     2100   1 2100   1
In [3]: pd.read_fwf("test.txt", comment='#', header=None)
Out[3]: 
         0        1     2  3     4  5
0   ile ::  eries/C  ne/r  5  mont  /
1  93.5000  17.9877  2093  1  2093  1
2  94.5000  18.0374  2094  1  2094  1
3  95.5000  18.0906  2095  1  2095  1
4  96.5000  18.1404  2096  1  2096  1
5  97.5000  18.2044  2097  1  2097  1
6  98.5000  18.2412  2098  1  2098  1
7  99.5000  18.2734  2099  1  2099  1
8  100.500  18.3431  2100  1  2100  1

on larger files with more header columns, all rows are included in this way.

Issue Description

Comment lines are included in the dataframe

Expected Behavior

Comment lines should NOT be included in the dataframe

Installed Versions

INSTALLED VERSIONS ------------------ commit : 0f437949513225922d851e9581723d82120684a6 python : 3.10.10.final.0 python-bits : 64 OS : Linux OS-release : 5.15.0-73-lowlatency Version : #80-Ubuntu SMP PREEMPT Wed May 17 13:58:47 UTC 2023 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_AU.UTF-8 LOCALE : en_AU.UTF-8 pandas : 2.0.3 numpy : 1.24.3 pytz : 2023.3 dateutil : 2.8.2 setuptools : 65.6.3 pip : 23.0.1 Cython : None pytest : 7.3.1 hypothesis : 6.75.2 sphinx : None blosc : None feather : None xlsxwriter : 3.1.2 lxml.etree : None html5lib : None pymysql : None psycopg2 : 2.9.3 jinja2 : 3.1.2 IPython : 8.13.2 pandas_datareader: None bs4 : 4.12.2 bottleneck : None brotli : fastparquet : None fsspec : None gcsfs : None matplotlib : 3.7.1 numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : 1.10.1 snappy : None sqlalchemy : None tables : None tabulate : None xarray : 2023.6.0 xlrd : None zstandard : 0.19.0 tzdata : 2023.3 qtpy : 2.3.1 pyqt5 : None
rsm-23 commented 1 year ago

@naught101 this is reproducible but I don't see any parameter called "comment" in the read_fwf documentation