npyio.py: genfromtxt() handles comments incorrectly with names=True (Trac #2184)

numpy-gitbot commented 11 years ago

Original ticket http://projects.scipy.org/numpy/ticket/2184 on 2012-07-11 by trac user khaeru, assigned to unknown.

The documentation for genfromtxt() reads:

When the variables are named (either by a flexible dtype or with names, there must not be any header in the file (else a ValueError exception is raised).

and also:

If names is True, the field names are read from the first valid line after the first _skipheader lines.

The cause of this seems to be in [https://github.com/numpy/numpy/blob/master/numpy/lib/npyio.py#L1347 numpy/lib/npyio.py at lines 1347-9]:

    if names is True:
        if comments in first_line:
            first_line = asbytes('').join(first_line.split(comments)[1:])

The last line should read first_line = first_line.split(comments)[0].

With the current code, the input line:

# Example comment line

will be transformed to:

Example comment line

resulting in columns named 'Example', 'comment' and 'line' (this is what the warning in the documentation is about).

But also the input line:

ColumnA ColumnB ColumnC # the column names precede this comment

will be transformed to:

the column names precede this comment

resulting in columns named 'the', 'column', 'names' …etc. In this instance actual column names present in the file are inappropriately discarded.

By taking the [0] portion of the split instead of [1:]:

Lines beginning with comments result in an empty string being passed to split_lines() on L1350, producing no usable output and causing the while not first_values loop to try the next line.
Partial-line comments following actual heading names are discarded, instead of the names themselves.
As a result, files can have commented headers of any length and column names, simultaneously.

numpy-gitbot commented 11 years ago

trac user khaeru wrote on 2012-07-11

Sorry, bad title. Also, what's the difference between the Trac issues list and https://github.com/numpy/numpy/issues ?

numpy-gitbot commented 11 years ago

Title changed from Remove to npyio.py: genfromtxt() handles comments incorrectly with names=True by trac user khaeru on 2012-07-11

numpy-gitbot commented 11 years ago

atmention:rgommers wrote on 2012-07-12

We opened Github issues only a few weeks ago, we're in the process of transitioning all Trac tickets to it. When that's done we'll close Trac, or make it read-only. For now you can use either one.

numpy-gitbot commented 11 years ago

atmention:rgommers wrote on 2012-07-12

Suggested fix looks correct.

numpy-gitbot commented 11 years ago

trac user khaeru wrote on 2012-07-12

Oh, I see — well, I also posted a branch with this fix and a pull request: https://github.com/numpy/numpy/pull/351

thouis / numpy-trac-migration

npyio.py: genfromtxt() handles comments incorrectly with names=True (Trac #2184) #5974