pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.69k stars 17.92k forks source link

Return a map of "Frame ranges/ positions" -> "Excel cells/ ranges" on DataFrame.to_excel #16173

Closed 0Hughman0 closed 11 months ago

0Hughman0 commented 7 years ago

e.g.

In[3]: print(f)
Out[3]:
                    Mon                  Tues      Weds       Thur
Meal
Breakfast         Toast                 Bagel    Cereal  Croissant
Lunch              Soup  Something Different!      Rice     Hotpot
Dinner            Curry                  Stew     Pasta    Gnocchi
Midnight Snack  Shmores               Cookies  Biscuits  Chocolate

In[7]: i = f.to_excel("t.xlsx")
In[8]: i
Out[8]:
                         Mon          Tues          Weds          Thur
Meal
Breakfast       <XLCell: B2>  <XLCell: C2>  <XLCell: D2>  <XLCell: E2>
Lunch           <XLCell: B3>  <XLCell: C3>  <XLCell: D3>  <XLCell: E3>
Dinner          <XLCell: B4>  <XLCell: C4>  <XLCell: D4>  <XLCell: E4>
Midnight Snack  <XLCell: B5>  <XLCell: C5>  <XLCell: D5>  <XLCell: E5>

In[10]: i.index.xl
Out[10]: <XLRange: A2:A5>

In[11]: i.columns.xl
Out[11]: <XLRange: B1:E1>

In[9]: i.loc["Lunch", :].xl
Out[9]: <XLRange: B3:E3>

The to_excel constructor has all the information needed to create such a map.

This would greatly enhance working with spreadsheet libraries to create graphs etc with exported DataFrames.

For example with xlsxwriter, to create a graph without can look like:

for col_num in range(1, len(calories_per_meal.index) + 1):
    without_chart.add_series({
        'name':       ["Without", col_num, 0],
        'categories': ["Without", 0, 1, 0, 4],
        'values':     ["Without", col_num, 1, col_num, 4]})

which to me looks pretty ugly and confusing. Instead with this could become:


for time in calories_per_meal.index:
    xl_linked_chart.add_series({
                        'name': time,
                        'categories': proxy.columns.xl.frange,
                        'values': proxy.loc[time].xl.frange})

I actually implemented some subclasses of DataFrame to do this here:

https://github.com/0Hughman0/xl_link

but I've realised this could become a lot more elegant and less buggy by intercepting the positions of cells further upstream.

chris-b1 commented 7 years ago

Interesting idea / package! It seems like fairly niche functionality, so my initial reaction is that this makes more sense, at least for now, as a 3rd party library.

FYI, you may look at intercepting the output of the ExcelFormatter class that pandas uses, it already provides a sort of abstract mapping of the data to excel locations. https://github.com/pandas-dev/pandas/blob/39cc1d0685481c77115f061d856cc60c1e59c8c2/pandas/io/formats/excel.py#L308

And you've maybe already seen these, but if not, this page from the xlsxwriter docs might provide some examples of things to support/make easy. http://xlsxwriter.readthedocs.io/pandas_examples.html

0Hughman0 commented 7 years ago

Cheers! Yeah I think that's a fair comment!

Thanks for the suggestion, I'll definitely look into this. It has occurred to me that the logic for where each cell goes is in the codebase somewhere. This could be a better solution, as essentially I'm repeating logic (on where to place header rows etc).

I've seen them, but I've not really thought of wrapping every one, but that is a nice idea. Might help with making the application seem less abstract.

Cheers.

0Hughman0 commented 6 years ago

Hi!

Dunno if it's worth giving you an update, but whatever!

The project has come a long way since April. I completely re-implremented the 'excel map', both to simplify it, and also to make use of ExcelFormatter. Probably more significantly I've added a XLMap.create_chart method, that can generate complete chart object for both xlsxwriter and openpyxl.

E.g.

import random
from xl_link import XLDataFrame

f = XLDataFrame(columns=('X', 'Y1', 'Y2'),
                         data={'X': range(10),
                               'Y1': list(random.randrange(0, 10) for _ in range(10)),
                               'Y2': list(random.randrange(0, 10) for _ in range(10))})
f.set_index('X', inplace=True)

xlmap = f.to_excel(writer, sheet_name='scatter', engine='openpyxl')
scatter_chart = xlmap.create_chart('scatter', x_axis_name='x', y_axis_name='y', title='Scatter Example')
xlmap.sheet.insert_chart('A1', scatter_chart)
xlmap.writer.save()

Produces this graph

https://github.com/0Hughman0/xl_link

I'd love to hear what you think.

mroeschke commented 11 months ago

Appears there hasn't been much activity or community support for this feature in a while so closing. Happy to reopen if there's renewed support