h2oai / datatable

A Python package for manipulating 2-dimensional tabular data structures
https://datatable.readthedocs.io
Mozilla Public License 2.0
1.81k stars 154 forks source link

[bug] fread fails to read column names when line starts with # #3072

Closed wong-korben closed 3 years ago

wong-korben commented 3 years ago

fread seems to be unable to read column names if the first line starts with "# ".

For example I am trying to read test.txt:

# wdf,a,f,b,c
1,2,3,4,5
5,4,3,2,1

However when running the following code: dt.fread('test.txt') The column names are replaced with C0, C1, C2 etc. as shown:

   |    C0     C1     C2     C3     C4
   | int32  int32  int32  int32  int32
-- + -----  -----  -----  -----  -----
 0 |     1      2      3      4      5
 1 |     5      4      3      2      1

fread works perfectly fine in these scenarios:

wdf,# a,f,b,c
1,2,3,4,5
5,4,3,2,1

and

#wdf,a,f,b,c
1,2,3,4,5
5,4,3,2,1

Environment: windows 10 python 3.8 datatable 1.0.0

st-pasha commented 3 years ago

Ah, yes. When fread sees a line starting with # at the beginning of the file, it assumes that line is a comment. You can work around this by using option skip_to_string="#".

wong-korben commented 3 years ago

Ah got it, I thought it was some commenting thing. Thanks!