pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.22k stars 17.78k forks source link

BUG: Modify an existing xlsx file with PANDAS and MIME TYPE ERROR #45884

Open macifTest opened 2 years ago

macifTest commented 2 years ago

Pandas version checks

Reproducible Example

The code bellow :

data_filtered = pd.DataFrame(
        [date, date, date, date], 
        index=[2,3,4,5]
    )

    book = openpyxl.load_workbook(file_origin)

    with pd.ExcelWriter(
        file_modif,
        engine="openpyxl",
        #mode="a",        
        datetime_format='dd/mm/yyyy hh:mm:ss', 
        date_format='dd/mm/yyyy'
        #if_sheet_exists="overlay"

    ) as writer:
        writer.book = book
        writer.sheets = dict((ws.title, ws) for ws in book.worksheets)       
        data_filtered.to_excel(writer, sheet_name="PCA pour intégration", index=False, startrow=2, startcol=5, header=False, verbose=True)

Issue Description

I have an issue with the use of Pandas + ExcelWriter + load_workbook. My need is to be able to modify data from an existing excel file (without deleting the rest). It works partly, but when I check the MIME Type of the produced file and the MIME TYPE of the original one, it is different. The application under test use the same MIME TYPE verification as the website https://www.htmlstrip.com/mime-file-type-checker

Original : application/vnd.openxmlformats-officedocument.spreadsheetml.sheet Modify : application/octet-stream

Expected Behavior

The code bellow :

data_filtered = pd.DataFrame( [date, date, date, date], index=[2,3,4,5] )

book = openpyxl.load_workbook(file_origin)

with pd.ExcelWriter(
    file_modif,
    engine="openpyxl",
    #mode="a",        
    datetime_format='dd/mm/yyyy hh:mm:ss', 
    date_format='dd/mm/yyyy'
    #if_sheet_exists="overlay"

) as writer:
    writer.book = book
    writer.sheets = dict((ws.title, ws) for ws in book.worksheets)       
    data_filtered.to_excel(writer, sheet_name="PCA pour intégration", index=False, startrow=2, startcol=5, header=False, verbose=True)  

Installed Versions

INSTALLED VERSIONS ------------------ commit : bb1f651536508cdfef8550f93ace7849b00046ee python : 3.8.10.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.18363 machine : AMD64 processor : Intel64 Family 6 Model 85 Stepping 7, GenuineIntel byteorder : little LC_ALL : None LANG : fr_FR.UTF-8 LOCALE : fr_FR.cp1252 pandas : 1.4.0 numpy : 1.22.1 pytz : 2021.1 dateutil : 2.8.2 pip : 21.2.4 setuptools : 58.0.2 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : 3.0.2 lxml.etree : 4.7.1 html5lib : None pymysql : None psycopg2 : 2.8.6 jinja2 : None IPython : None pandas_datareader: None bs4 : 4.10.0 bottleneck : None fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : 3.0.9 pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : 2.0.1 xlwt : 1.3.0 zstandard : None
twoertwein commented 2 years ago

None of the attributes of ExcelWriter are considered public (this will change in 1.5 #45572).

I don't recommend overwriting .book but I assume that something similar as described here might work for you.

macifTest commented 2 years ago

Thanks for your reply. I tried your suggestion, but it doesn't work. The MIME type is still not good. This shows me: application/zip bellow the code I used : with open(file_origin, mode="rb+") as handle: book = openpyxl.load_workbook(handle) handle.seek(0) with pd.ExcelWriter( handle, if_sheet_exists="overlay", engine="openpyxl", mode="a" ) as xl_writer: xl_writer.book = book # still uses the same handle! xl_writer.sheets = dict((ws.title, ws) for ws in book.worksheets) chunk = pd.DataFrame([date, date, date, date], index=[2,3,4,5]) chunk.to_excel(xl_writer, merge_cells=False, index=False, startrow=2, startcol=5, header=False, verbose=True, sheet_name="PCA pour intégration")

Thanks