Open matthiastns opened 5 years ago
Which version was this working on previously? Also can you share the relevant data from the file you are reading?
Hi William.
Thanks for your quick feedback.
From conda list --revisions my previous version was 0.23.1 (conda-forge)
Attached you find one example of the files I read. They are several of them, all equal in structure. It works with none of them since the update.
This worked as expected before the update. I received a dataframe with multi-index columns as expected.
Matthias
Von: William Ayd [mailto:notifications@github.com] Gesendet: Dienstag, 26. Februar 2019 17:16 An: pandas-dev/pandas pandas@noreply.github.com Cc: Hufnagel, Matthias (TSHAM) matthias.hufnagel@kantar.com; Author author@noreply.github.com Betreff: Re: [pandas-dev/pandas] read_excel throws ValueError: cannot specify usecols when specifying a multi-index header (#25449)
Which version was this working on previously? Also can you share the relevant data from the file you are reading?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_pandas-2Ddev_pandas_issues_25449-23issuecomment-2D467503309&d=DwMCaQ&c=zdK58V2JKULZdB8nuBRpog&r=viu6wFhKUAAQyfCRfccCyIRRc8exHnCfucKtObjpDexOlfGP1yLhpyPweI2b5iGT&m=RZUnM6_s4ZHW66-t95Ga16y9iX9591ZVsvQhnJfceFE&s=-Jipmzfk8h5yeoPoyqacrxkZAQnYHcqgq91GtQyH88Y&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AMDlZGRWsflorXB07PihBxo-2Dbnu-2DjLwfks5vRV2-2DgaJpZM4bSbs2&d=DwMCaQ&c=zdK58V2JKULZdB8nuBRpog&r=viu6wFhKUAAQyfCRfccCyIRRc8exHnCfucKtObjpDexOlfGP1yLhpyPweI2b5iGT&m=RZUnM6_s4ZHW66-t95Ga16y9iX9591ZVsvQhnJfceFE&s=n2UslOArWjA8sk2Qvaks4VqHWLalOeo_jHY0KTTCPdI&e=.
Kantar Disclaimerhttp://www.kantar.com/disclaimer.html
Hi William.
As a follow up to your question about my installed version, I learned how to get back to a previous version of my conda environment. Thanks for that in the first place.
Just to let you know:
I went back to my environment that included pandas 0.23.1 by running
C:\Users\hm113045\AppData\Local\Continuum\Anaconda3\Scripts
and read_excel works perfectly with multi-index header and usecols again.
Matthias
Von: William Ayd [mailto:notifications@github.com] Gesendet: Dienstag, 26. Februar 2019 17:16 An: pandas-dev/pandas pandas@noreply.github.com Cc: Hufnagel, Matthias (TSHAM) matthias.hufnagel@kantar.com; Author author@noreply.github.com Betreff: Re: [pandas-dev/pandas] read_excel throws ValueError: cannot specify usecols when specifying a multi-index header (#25449)
Which version was this working on previously? Also can you share the relevant data from the file you are reading?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_pandas-2Ddev_pandas_issues_25449-23issuecomment-2D467503309&d=DwMCaQ&c=zdK58V2JKULZdB8nuBRpog&r=viu6wFhKUAAQyfCRfccCyIRRc8exHnCfucKtObjpDexOlfGP1yLhpyPweI2b5iGT&m=RZUnM6_s4ZHW66-t95Ga16y9iX9591ZVsvQhnJfceFE&s=-Jipmzfk8h5yeoPoyqacrxkZAQnYHcqgq91GtQyH88Y&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AMDlZGRWsflorXB07PihBxo-2Dbnu-2DjLwfks5vRV2-2DgaJpZM4bSbs2&d=DwMCaQ&c=zdK58V2JKULZdB8nuBRpog&r=viu6wFhKUAAQyfCRfccCyIRRc8exHnCfucKtObjpDexOlfGP1yLhpyPweI2b5iGT&m=RZUnM6_s4ZHW66-t95Ga16y9iX9591ZVsvQhnJfceFE&s=n2UslOArWjA8sk2Qvaks4VqHWLalOeo_jHY0KTTCPdI&e=.
Kantar Disclaimerhttp://www.kantar.com/disclaimer.html
Just saw that I had something wrong in my clipboard, Sorry
I was running
conda install --revision N
Matthias
Von: Hufnagel, Matthias (TSHAM) Gesendet: Mittwoch, 27. Februar 2019 10:08 An: 'pandas-dev/pandas' reply@reply.github.com Betreff: AW: [pandas-dev/pandas] read_excel throws ValueError: cannot specify usecols when specifying a multi-index header (#25449)
Hi William.
As a follow up to your question about my installed version, I learned how to get back to a previous version of my conda environment. Thanks for that in the first place.
Just to let you know:
I went back to my environment that included pandas 0.23.1 by running
C:\Users\hm113045\AppData\Local\Continuum\Anaconda3\Scripts
and read_excel works perfectly with multi-index header and usecols again.
Matthias
Von: William Ayd [mailto:notifications@github.com] Gesendet: Dienstag, 26. Februar 2019 17:16 An: pandas-dev/pandas pandas@noreply.github.com<mailto:pandas@noreply.github.com> Cc: Hufnagel, Matthias (TSHAM) matthias.hufnagel@kantar.com<mailto:matthias.hufnagel@kantar.com>; Author author@noreply.github.com<mailto:author@noreply.github.com> Betreff: Re: [pandas-dev/pandas] read_excel throws ValueError: cannot specify usecols when specifying a multi-index header (#25449)
Which version was this working on previously? Also can you share the relevant data from the file you are reading?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_pandas-2Ddev_pandas_issues_25449-23issuecomment-2D467503309&d=DwMCaQ&c=zdK58V2JKULZdB8nuBRpog&r=viu6wFhKUAAQyfCRfccCyIRRc8exHnCfucKtObjpDexOlfGP1yLhpyPweI2b5iGT&m=RZUnM6_s4ZHW66-t95Ga16y9iX9591ZVsvQhnJfceFE&s=-Jipmzfk8h5yeoPoyqacrxkZAQnYHcqgq91GtQyH88Y&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AMDlZGRWsflorXB07PihBxo-2Dbnu-2DjLwfks5vRV2-2DgaJpZM4bSbs2&d=DwMCaQ&c=zdK58V2JKULZdB8nuBRpog&r=viu6wFhKUAAQyfCRfccCyIRRc8exHnCfucKtObjpDexOlfGP1yLhpyPweI2b5iGT&m=RZUnM6_s4ZHW66-t95Ga16y9iX9591ZVsvQhnJfceFE&s=n2UslOArWjA8sk2Qvaks4VqHWLalOeo_jHY0KTTCPdI&e=.
Kantar Disclaimerhttp://www.kantar.com/disclaimer.html
I don't think your attachment came through. Can you add to a comment directly instead?
I did close the issue by mistake, sorry.
From the excel files, I do not read the first worksheet, only the worksheets with min/max prefix that are equal in structure .
Can you add code that is completely copy / pastable to reproduce?
Sorry, William.
Here is the code that should produce the error message if the Excel file I uploaded earlier resides in the same folder. I could not validate if this really throws the reported error at the moment, as I downgraded my environment back to pandas 0.23.1 where I had (and still have) no issues.
import pandas as pd
df=pd.read_excel('BugReportSample.xlsx', sheet_name='max_brand',header=[18,19], usecols='B,D:J,L')
Strange - the code in the traceback hasn't been touched in 6 years. I also created a 0.23.1 environment locally and got the same error.
Can you post the output of pd.show_versions()?
I just got this same error when I went from 23.4 to 24.0, reverting fixed it.
here is my code & error message
def load_data(i_file_name=file_1):
"""reads i_file excel spreadsheet into df"""
data = pd.read_excel((DATA_FOLDER + i_file_name),
sheet_name="historical data by country",
header=[15, 16],
skip_rows=17,
usecols="G:CO", #ValueError: cannot specify usecols when specifying a multi-index header
index_col = None, #line15
)
return data
>>>load_data()
File "cell_impact_DF.py", line 121, in <module>
data = load_data()
File "cell_impact_DF.py", line 15, in load_data
index_col = None,
File "...lib\site-packages\pandas\util\_decorators.py", line 188, in wrapper
return func(*args, **kwargs)
File "...lib\site-packages\pandas\util\_decorators.py", line 188, in wrapper
return func(*args, **kwargs)
File "...lib\site-packages\pandas\io\excel.py", line 375, in read_excel
**kwds)
File "...lib\site-packages\pandas\io\excel.py", line 718, in parse
**kwds)
File "...lib\site-packages\pandas\io\excel.py", line 601, in parse
**kwds)
File "...lib\site-packages\pandas\io\parsers.py", line 2156, in TextParser
return TextFileReader(*args, **kwds)
File "...lib\site-packages\pandas\io\parsers.py", line 895, in __init__
self._make_engine(self.engine)
File "...lib\site-packages\pandas\io\parsers.py", line 1132, in _make_engine
self._engine = klass(self.f, **self.options)
File "...lib\site-packages\pandas\io\parsers.py", line 2171, in __init__
ParserBase.__init__(self, kwds)
File "...lib\site-packages\pandas\io\parsers.py", line 1391, in __init__
raise ValueError("cannot specify usecols when "
ValueError: cannot specify usecols when specifying a multi-index header
Hi William.
Thanks for taking care of this.
I posted the show_versions() output in the first place, I attach it to the mail below.
Just to clarify (as you wrote that you have the same error using pandas 0.23.1)
I see the reported error only when I use the new 0.24.1 pandas version
After I went back to 0.23.1 my code runs fine.
Matthias
Output of show_versions (taken from my posting in github as my current installation is different now, as I explained) INSTALLED VERSIONS commit: None python: 3.7.1.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.24.1 pytest: 3.3.2 pip: 19.0.3 setuptools: 40.8.0 Cython: 0.27.3 numpy: 1.15.4 scipy: None pyarrow: None xarray: None IPython: 7.3.0 sphinx: 1.6.6 patsy: 0.5.0 dateutil: 2.8.0 pytz: 2018.9 blosc: None bottleneck: 1.2.1 tables: None numexpr: None feather: None matplotlib: 2.1.2 openpyxl: 2.4.10 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 1.0.2 lxml.etree: None bs4: 4.6.0 html5lib: 1.0.1 sqlalchemy: 1.2.1 pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None gcsfs: None
Von: William Ayd [mailto:notifications@github.com] Gesendet: Freitag, 1. März 2019 18:39 An: pandas-dev/pandas pandas@noreply.github.com Cc: Hufnagel, Matthias (TSHAM) matthias.hufnagel@kantar.com; State change state_change@noreply.github.com Betreff: Re: [pandas-dev/pandas] read_excel throws ValueError: cannot specify usecols when specifying a multi-index header (#25449)
Strange - the code in the traceback hasn't been touched in 6 years. I also created a 0.23.1 environment locally and got the same error.
Can you post the output of pd.show_versions()?
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_pandas-2Ddev_pandas_issues_25449-23issuecomment-2D468747669&d=DwMCaQ&c=zdK58V2JKULZdB8nuBRpog&r=viu6wFhKUAAQyfCRfccCyIRRc8exHnCfucKtObjpDexOlfGP1yLhpyPweI2b5iGT&m=iX21_SrtmkrwWrb5pjWeDC0AL1m3X_IzVMJ6mDMDCDo&s=3gYCW5CPIdziZqzyrWCWntaYbLs7_UD4RLnSH4H5v_k&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AMDlZM-5FJydRX70vNHmbHXTDy7iS3lR5-5Fks5vSWXIgaJpZM4bSbs2&d=DwMCaQ&c=zdK58V2JKULZdB8nuBRpog&r=viu6wFhKUAAQyfCRfccCyIRRc8exHnCfucKtObjpDexOlfGP1yLhpyPweI2b5iGT&m=iX21_SrtmkrwWrb5pjWeDC0AL1m3X_IzVMJ6mDMDCDo&s=9psaZU_NrC1suKUyUWQ0JZ1ZsxMRdfCz4R5PfGmxA7I&e=.
Kantar Disclaimerhttp://www.kantar.com/disclaimer.html
Hi William.
Is there any update on this? I was really happy and thankful that you did take care of this. This strange behaviour is keeping me from updateing my environment now. Would be great to know if there is a chance that this might get fixed in a future release. Any feedback highly appreciated. Again, thanks for taking care in the first place.
Kind regards
Matthias
@matthiastns we have almost 3000 issues and are all a small group of volunteers - in order to have things fixed a PR would advance the cause
Thanks a lot Jeff. Highly appreciated. I'm not a native speaker and I hope I was able to express how much I appreciate that all of you take care of this. At least I tried to do that in my last post. Sorry if it nevertheless came over as too much demanding or even offending.
Unfortunately I'm not quite sure what a PR is. I would initiate(?) one, if necessary. Something like an official request? Not sure either, if this is appropriate in this case or if this will be received like calling for emergency for a minor issue. It's hard to guess from outside.
@matthiastns no, it would be a patch for your issue, see here: http://pandas.pydata.org/pandas-docs/stable/development/contributing.html
this is something you or others in the community would need to do.
@jreback Thanks for your immidiate feedback. So, from going through the linked documentation, I guess "PR" stands for "pull request".
This is a real challange for me. I have some experience with git but I'm not an software developper in the first place. So I have no experience with common development techniques, methods and ways of working.
But I will think about it. Could be also rewarding to dive into that. My main concern is, that finding the cause of my particular issue might be really tricky. Why? As William stated earlier, the code of the function that causes the error has not been touched for 6 years. If this is true, we might have a side effect from somewhere else. My feeling is that diving into this will be a rough task to solve for a code development newbie with no experience in the pandas code base at all.
Has there been any update on this issue? @matthiastns have you managed to resolve this?
Any help will be greatly appreciated!
I also got the same issue
this bug still exist on version 1.2.4 ...
@akasolace and you or anyone in the community is welcome to submit a patch
Currently August 8th 2021 and I incurred the same issue with conda 4.7.11 and pyton 3.7.3
I've been working on this issue, since I have encountered the same problem.
I have tested with the data provided in this thread as well as the data I'm using (which, sadly, I cannot share) and it looks like just deleting the check of usecols
+multi-index in the parser is enough for it to work.
I've been trying to pinpoint where usecols
is ultimately used to check if there is really an incompatibility with a multi-index but as far as I have been able to check, there is none. When you call read_excel
, the usecols
parameters is passed along and finally used by PythonParser._infer_columns()
, which does the correct thing: it reads the multi-index considering only the columns specified by usecols
and then constructs the levels.
The code that originally introduced the parameter incompatibility check is from 9 years ago (commit a9a89f89e13cf006f6b58da1747aa65f86f74cfb, which states "TST: disallow names, usecols, non-numeric in index_cols"), so maybe it is not a problem anymore? I'm not comfortable just deleting this check without being absolutely sure that I'm not breaking anything, but I'm afraid that I'm not familiarized enough with the pandas internals to be sure (and I'm sure that multi-index must be a horrible beast).
If anyone more knowledgeable than me can confirm that the check is not needed anymore I can go ahead and submit the PR, along with a test to ensure it works correctly.
Not sure if anyone still cares about this, but I have also noticed that simply removing the check allows the code to read my file as expected.
Problem description
df=pd.read_excel(xl_file, sheet_name=xl_sheet,header=[18,19], usecols='B,D:J,L')
throws the following error after conda update:
ValueError: cannot specify usecols when specifying a multi-index header
The script run perfectly fine before.
Expected Output
Dataframe with multi-index columns, as before.
Output of
pd.show_versions()