pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
42.7k stars 17.61k forks source link

DOC: xlsx and ods support unclear in ExcelWriter #48514

Open buhtz opened 1 year ago

buhtz commented 1 year ago

Pandas version checks

Location of the documentation

https://pandas.pydata.org/docs/reference/api/pandas.ExcelWriter.html

Documentation problem

This is not about the naming problem that "ExcelWriter" also supports ods files but is named "Excel" instead of "ExcelAndOds" or something better.

See the argument engine in the docu.

It should explain which engines (and packages behind) are used by default for the different possible filetypes. It only explains which one is deprecated.

See the argument engine_kwargs.

Here is a list of engines/packages. It is unclear if this are the engine strings that can be used with the engine keyword or if this are the package names.

For odf it seems that odswriter is the used engine. But that project seems dead because of the last commit in 2016 (6 years ago).

Looking around in the internet I also find tutorials using odfpy. Again it is not clear if this is the engine string or just the package name. But that tutorials quit old and it is unclear for me if this is supported or not.

I also found a tutorial using engine="odf" which is unclear which package is used here.

Suggested fix for documentation

Clearly differentiate between package names and the string that can be used for engine argument.

Make clear which engines/packages are supported for which file types.

WillAyd commented 1 year ago

Pandas uses odfpy not odswriter. You can see that in the high level documentation:

https://pandas.pydata.org/docs/user_guide/io.html?highlight=odfpy#opendocument-spreadsheets

Of course open to any improvements you'd like to make to the excel page you've linked.

W.r.t. the naming of "ExcelWriter", the history of how this was developed was that opendoc support came way later than excel support. I think at the time we decided it wasn't worth any API churn for correctness when most people think of ods / excel files to be conceptually the same (at least as far as pandas is concerned). I don't see us changing that

buhtz commented 1 year ago

Of course open to any improvements you'd like to make to the excel page you've linked.

I contributed what I thought that could be improved. That is what the issue is about.

Core devs know the package better and can write the correct content into the docu.

ahobeost commented 2 weeks ago

Perhaps the name of this issue should be changed to: DOC: xlsx, xls, and ods support BROKEN and unclear in ExcelWriter

As pointed out above, many engines are supported for READING excel-like files: https://pandas.pydata.org/docs/user_guide/io.html?highlight=odfpy#opendocument-spreadsheets

Most of these engines are not supported in ExcelWriter, however.

The following code reproduces the issue:

engine = 'calamine'
with pandas.ExcelWriter('tryout.xls', engine=engine) as writer:
    print('works')

This results in: ValueError: No Excel writer 'calamine'

Now replace enginewith any of the declared ones: 'xlsxwriter', 'odswriter', 'odf', 'odfpy', 'pyxlsb', 'xlrd

'openpyxl' is the only engine that ExcelWriteractually accepts.

Leaving out the engine entirely leads to the following error: ValueError: No engine for filetype: 'xls'

As it states in the documentation of to_excel: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_excel.html#pandas.DataFrame.to_excel

If you wish to write to more than one sheet in the workbook, it is necessary to specify an ExcelWriter object