Open scotscotmcc opened 2 years ago
That potential solution of mine above doesn't work well. For one, it only checks for openpyxl
, not the other handful of engines/libraries. Second, openpyxl
isn't actually imported in that file, so this fails.
I was working on another potential solution that introduces a new function to handle it and runs through a series of the import_optional_dependency()
calls and checks the object against the type from the different libraries.
Is your feature request related to a problem?
When your
io
parameter forpd.read_excel()
is an openpyxl.Workbook object, the function should assume thatengine='openpyxl'
. Right now, if you don't specify the engine, you will get aValueError: Invalid file path or buffer object type: <class 'openpyxl.workbook.workbook.Workbook'>
. However, if you explicitly pass the engine, it will load fine.It seems like passing in the specific workbook object would only ever be done if you want it to read it with that engine, so we should just assume that engine.
The current documentation for
read_excel()
specifies that you can use anxlrd.Book
, and that works fine without an explicitengine
. It doesn't say that you can use anopenpyxl.Workbook
at all, but it does work when you declare theengine
.This has come up as a small inconvenience for me a few times. I have a script where I am pulling some data out of Excel that is not a table at all, and so I'm using opening it as an
openpyxl.Workbook
and using some of the tools from that library, and then also want to pull another table using pandas. I want to do this without opening the workbook multiple times, so I use the same workbook to pass intopd.read_excel()
. It then throws and error and I say "Oh yea, I need the engine."Describe the solution you'd like
One should be able to do
This should work the same as just
df = pd.read_excel(filepath,...)
Describe alternatives you've considered
Alternatives seem to be (1) explicitly pass
engine
or (2) don't useopenpyxl.Workbook
objects.Potential Solution