Open spillz opened 6 years ago
Sure, I think this could be made to work similar to CSV (though in general life is easier without duplicate columns). PR to fix would be welcome!
If I was in a position to submit PRs, I would. But as I am not, I thought a bug report would be better then nothing.
I am fine with the duplicate cols being treated as an error but keep in mind that means you can no longer open arbitrarily named datasets. Also, the read_csv behavior really isn't ideal either. The multiindex becomes a regular index and there are no warnings when duplicates are found and cols renamed.
Anyway the main reason I reported this as a bug is that it took me half an hour to figure out that the error was being caused by duplicate columns. The message is very obscure!
On Jan 25, 2018 1:21 PM, "chris-b1" notifications@github.com wrote:
Sure, I think this could be made to work similar to CSV (though in general life is easier without duplicate columns). PR to fix would be welcome!
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/19395#issuecomment-360554365, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFd5c5WfoQyY2_9Dcagms0pWjDIfT-cks5tOMY2gaJpZM4RtPKI .
Appreciate the report either way, would also take a PR for an improved error msg which would be easier
I just upgrade from Pandas 0.22 to 0.24, and my code is breaking. What's happening in my code appears related to this issue.
I have a worksheet in Excel with three tables in it. Those three tables have the same columns.
In version 0.22.0 of pandas, I could read all three of those tables into python with three read_excel statements, using the usecols argument to specify which part of the spreadsheet I wanted to read in.
In version 0.24.1 of pandas, the presence of duplicate column names in a part of the spreadsheet that I'm not reading into Python forces the column names in the imported dataframe to have ".1" suffixes appended to them.
I can adjust my code to rename the columns after import, of course, but that's ugly. Was this an intentional change to the read_excel function that hasn't been documented? Or is this a bug?
Code Sample, a copy-pastable example if possible
Problem description
The above snippet generates the following output:
Note that read_csv will read the csv (but mangles the column index). read_excel fails without clearly indicating the nature of the error.
Expected Output
read_excel should approximately match what read_csv does. If not, it would be a lot easier to diagnose the error if the error message indicated that the problem was a duplicate column and, ideally, which columns is the cause.
Output of
pd.show_versions()