pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.22k stars 17.78k forks source link

DataFrame.to_excel with xlsxwriter and constant_memory makes most of the cells empty #15392

Open mhooreman opened 7 years ago

mhooreman commented 7 years ago

Problem description

Hello,

When we export data frames to excel using xlsxwriter with the option constant_memory set to True, most of the cells are empty.

Thanks a lot

Code Sample

import seaborn.apionly as sns
import pandas as pd
iris = sns.load_dataset('iris')
with pd.ExcelWriter('constant_memory_false.xlsx', engine='xlsxwriter', options=dict(constant_memory=False)) as xlw:
    iris.to_excel(xlw)  # Open it: this is OK
with pd.ExcelWriter('constant_memory_true.xlsx', engine='xlsxwriter', options=dict(constant_memory=True)) as xlw:
    iris.to_excel(xlw)  # Open it: most of the data are missing

Output of pd.show_versions()

commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 3.13.0-87-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 pandas: 0.19.2 nose: 1.3.7 pip: 9.0.1 setuptools: 34.1.1 Cython: 0.25.2 numpy: 1.12.0 scipy: 0.18.1 statsmodels: 0.6.1 xarray: None IPython: 5.2.2 sphinx: 1.5.2 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2016.10 blosc: None bottleneck: 1.2.0 tables: 3.3.0 numexpr: 2.6.2 matplotlib: 2.0.0 openpyxl: None xlrd: 1.0.0 xlwt: 1.2.0 xlsxwriter: 0.9.6 lxml: None bs4: 4.5.3 html5lib: 0.9999999 httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: 2.6.2 (dt dec pq3 ext lo64) jinja2: 2.9.5 boto: None pandas_datareader: None
chris-b1 commented 7 years ago

xref #11355

This happens because pandas writes data column-by-column, where xlsxwriter needs the data row-by-row to use constant_memory. Fixing the above would fix this (a little complex, but PR welcome!) or we could also catch and raise a NotImplementedError in the meantime.

mhooreman commented 7 years ago

Thanks Chris

bryanfree66 commented 6 years ago

import seaborn.apionly as sns import pandas as pd iris = sns.load_dataset('iris') with pd.ExcelWriter('constant_memory_false.xlsx', engine='xlsxwriter', options=dict(constant_memory=False)) as xlw: iris.to_excel(xlw) with pd.ExcelWriter('constant_memory_true.xlsx', engine='xlsxwriter', options=dict(constant_memory=True)) as xlw: iris.to_excel(xlw) Traceback (most recent call last): File "", line 1, in File "/Users/bryan/code/pandas/pandas/io/excel.py", line 1765, in init engine_kwargs) File "/Users/bryan/code/pandas/pandas/io/excel.py", line 924, in init raise NotImplementedError('The option constant_memory=True is ' NotImplementedError: The option constant_memory=True is not supported.**

ohlr commented 4 years ago

What is needed to solve this? With pandas version v1 it does not throw an error but only reports the last row of the DF

TacoBel42 commented 1 year ago

Hi, any updates ?

RoelantStegmann commented 4 months ago

I would still love this :)