pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
42.57k stars 17.56k forks source link

ValueError: Usecols do not match columns, columns expected but not found: ['Col3', 'Col1'] #59139

Open Hermann12 opened 2 days ago

Hermann12 commented 2 days ago

Pandas version checks

Reproducible Example

# https://stackoverflow.com/a/78681763/12621346

import pandas as pd

df = pd.read_csv("test.csv", usecols=[‘Col1’,’Col2’], header=0, names=['first','third'])
print(df)

Issue Description

This is still a bug! If I read the documentation it said clearly: "For example, a valid list-like usecols parameter would be [0, 1, 2] or ['foo', 'bar', 'baz']." If I use it as described I get: "ValueError: Usecols do not match columns, columns expected but not found: ['Col3', 'Col1']". Only [0,1,2] index is working! This ERROR message is also misleading/ wrong.

Expected Behavior

As the documentation describe the behavior. usecase: https://stackoverflow.com/a/78681763/12621346 If I would read according old column names and rename it to new names this works only with index 1, 2, 3 and not column names.

Installed Versions

2.0.3
Aloqeely commented 2 days ago

Thanks for the report! The documentation states: "If names are given, the document header row(s) are not taken into account" which is the current behavior, so this sounds more to me like an enhancement request than a bug report, is that right?

Hermann12 commented 2 days ago

I think this is a discrepancy to the other referenced sentence see my report, in the documentation. Quote:"For example, a valid list-like usecols parameter would be [0, 1, 2] or ['foo', 'bar', 'baz']." Therefore I assume usecols works with both, what "or" said. Usecols is for read the csv, names is for representation of the result, if I understood it right. So in my opinion it's a bug, because it's not working with both as described into the documentation.

Aloqeely commented 2 days ago

Well yes, you can pass a list of the column names just as the documentation states. But it also states that if names are provided then the header row won't be considered.

Hermann12 commented 1 day ago

Stupid behavior. Not consistent in my opinion.

Aloqeely commented 1 day ago

If your CSV file has the columns col1, col2, col3, and you passed names=['name1', 'name2', 'name3'], then, passing usecols=['name1', 'name3'] will work correctly.

Can you share why you think it's inconsistent? If you passed names then it makes sense that usecols will rely on those names rather than the names in the CSV header row, do you agree?