Open toan-quach opened 3 weeks ago
After reading more into the source code, I noticed that the read_options
is passed to the "load" excel function of the engine. In openpyxl
case, the load_workbook
function doesn't have any ignore header related parameter. Hence, I concluded that this is a limitation of the openpyxl
engine itself.
@ritchie46 Do let me know if my understanding is correct 😄
Checks
Reproducible example
If I use the default engine option ("xlsx2csv") it returns the expected result Code:
Result:![image](https://github.com/pola-rs/polars/assets/93168955/079b2fe2-4d14-4987-bb77-e35e92e1dbe3)
If I use "openpyxl" as the engine, it returns an unexpected result Code:
Result:![image](https://github.com/pola-rs/polars/assets/93168955/fab52ddb-738e-4272-b126-ddc3b067a896)
File used: example.xlsx
Log output
No response
Issue description
When I use the default engine xlsx2csv to read the Excel file without header, it returns the correct result, with the "supposed" header being one of the rows in the DataFrame. But when I switch to openpyxl (the engine I'm using) and read the file again, also with no header, the "supposed" header is now read as the header and not one of the rows of the DataFrame. I have provided images and code to reproduce.
Expected behavior
I want to use openpyxl as my engine and read the Excel file without a header. I expect the 1st row in the example.xlsx file to be 1 of the row in the DataFrame and not as the DataFrame header.
Installed versions