kz26 / PyExcelerate

Accelerated Excel XLSX Writing Library for Python 2/3
https://pypi.org/project/PyExcelerate/
BSD 2-Clause "Simplified" License
530 stars 60 forks source link

The weight of a Excel file is bigger than pandas have created #78

Open IvanHod opened 5 years ago

IvanHod commented 5 years ago

When I created Excel files using PyExcelerate and pandas, the size of pandas xlsx file was less by 13mb. (48mb for PyExcelerate and 35mb for pandas). Can I decrease size of PyExcelerate file? The table had 100 000 rows and 70 columns.

kevmo314 commented 5 years ago

You can try compressing the xlsx file with an alternate compression algorithm. We use ZIP_DEFLATED but ZIP_LZMA might give you better results at a performance penalty. If you do find that it gives you better results maybe we can look into adding it as a tweakable argument to the writer.

kevmo314 commented 5 years ago

You will need to modify the constant here: https://github.com/kz26/PyExcelerate/blob/247406dc41adc7e94542bcbf04589f1e5fdf8c51/pyexcelerate/Writer.py#L45

IvanHod commented 5 years ago

After zipping file using ZIP_LZMA, the size of the xlsx file is 36mb, but compressing time is 18 minutes. (The same file). So, it is not solution.

After zipping a file with the help of ZIP_BZIP2 compression, it is not possible to open file.

I think, the problem is, using the PyExcelerate, that a sheet contains the all strings as values. Pandas makes a separate file "sharedStrings.xml", which keeps only different strings, and doesn't keep the same strings. Because of this, unzipped xls sheet files have size 462mb for pandas and 795mb for PyExcelerate.

kevmo314 commented 5 years ago

Yes this is partially an optimization. When we were implementing it, we found that building that table was nontrivially expensive for large sheets, so we opted to ignore it hoping that it would be a negligible cost due to a zip deflation. I see that's not the case though, I'll take another look at the shared strings table and see if we can optimize it behind an option or something.