Open mcrumiller opened 3 months ago
I'm pretty sure @alexander-beedie already has a design for this (see https://github.com/pola-rs/polars/pull/15808#issuecomment-2067944349). Not sure it will make it into 1.0.0 though.
I do, and it's "mostly" option 3, where the most common options are all exposed at the top-level and we convert them to engine-specific calls ourselves so the user doesn't have to (as we're now doing with schema_overrides
, for example).
We will still maintain read_options
and engine_options
for what should become a vanishingly small number of cases not handled by the top-level parameters.
Thanks; should I go ahead and close this one?
Thanks; should I go ahead and close this one?
It's ok, can leave it here so I have something to focus on and close when it's done (hopefully very shortly) 🤣
FYI: #17263 now contains a common columns
param - more to come...
...and just added a common has_header
param for all engines too👌
I'd like to comment on the suggestion for range
with a +1.
Often times spreadsheets have extra information in additional cells, so it's important to build the dataframe from a specified range or something like an excel table (in calamine and usage)
Description
Somewhat related to #17263.
read_excel
has two parameters,engine_options
andread_options
:engine_options
- Additional options passed to the underlying engine’s primary parsing constructor (given below), if supportedread_options
- Options passed to the underlying engine method that reads the sheet dataThis is really confusing from a user perspective. What's the difference between engine options that parse the data and engine options that reads the data? IMO, it would be easier to provide a universal set of options that are converted to the engine-specific options behind the scenes. All engines provide the same functionality for reading at this point, so it would be nice to instead have the following parameters (feel free to add):
sheet_name
- specifies sheet by namesheet_id
- specifies sheet by ID, 1-indexed. If 0, return all sheets as a dict.range
- e.g.B3:AC52
or something like[(2,5), (83, 15)]
dtypes
- dtypes of columns~ _(already available as "schemaoverrides")