pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.87k stars 18.02k forks source link

BUG: "Bad CRC-32 for file'docProps/core.xml" for Read large Excel file #59523

Open ghz20040102 opened 3 months ago

ghz20040102 commented 3 months ago

Pandas version checks

Reproducible Example

import pandas as pd
........
df = pd.read_excel(file_path, sheet_name=sheet_name)
........

Issue Description

When I open an Excel file(.xlsx) with 300000 rows using pd.read_Excel, an error message appears: Bad CRC-32 for file'docProps/core.xml. But if the table has less than 1000 rows, it is normal to open the table with pd.read_excel. pd.ExcelFile(file_path) has the same problem. Why?

Expected Behavior

df = pd.read_excel(file_path, sheet_name=sheet_name)

Installed Versions

pd.show_versions() INSTALLED VERSIONS

commit : 0f437949513225922d851e9581723d82120684a6 python : 3.8.10.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.22621 machine : AMD64 processor : Intel64 Family 6 Model 186 Stepping 3, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : Chinese (Simplified)_China.936 pandas : 2.0.3 numpy : 1.24.4 pytz : 2024.1 dateutil : 2.9.0.post0 setuptools : 56.0.0 pip : 21.1.1 Cython : None pytest : 8.1.1 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None brotli : None fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : 3.1.2 pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : None snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2024.1 qtpy : None pyqt5 : None

rhshadrach commented 3 months ago

Can you generate a DataFrame and write it to an Excel File with pandas that demonstrates this problem? If not, I'd guess that the file with 300000 rows is invalid.

In addition, can you post your stack trace when the exception is raised.

grantrosse commented 2 months ago

this is happening to me as well- if I create an excel file fresh and append to it using excelwriter it works, but if I try to append to that file again, I get the same error: Bad CRC-32 for file 'docProps/core.xml'