scanny / python-pptx

Create Open XML PowerPoint documents in Python
MIT License
2.42k stars 525 forks source link

New Chart Types in Office 2016 #583

Open perkes opened 4 years ago

perkes commented 4 years ago

Office 2016 introduced several new types of charts that are currently not supported by python-pptx: Waterfall, Histogram, Pareto, Box & Whisker, Treemap and Sunburst. The XMLSchema for these chart types seems to be different to those charts previously available. The most obvious difference is the names of the tags instead of , actually all tags are cx instead of c. I tried adding a new class to xmlwriter.py and a new enum to enum.chart but that proved to be insufficient. It seems many parts of the code assume the tags to be named <c:.*>.

It would be great if we could collaborate to support all (or at least some) of these new chart types.

scanny commented 4 years ago

I think a good first step is to locate the XMLSchema (probably a .xsd file) for these new chart types. If you inspect the (XML) header of one of the charts you should see cx mapped to a namespace identifier, probably a Microsoft URL. That URL will be the unique identifier and should locate it easily on Google.

If it is already in the schema folder for python-pptx then we should be in good shape. If not, we'll want to add it there for easy reference. https://github.com/scanny/python-pptx/tree/master/spec/ISO-IEC-29500-4/xsd

Once we've gotten that far we'll be able to work out the next step. In general, we'll need to inspect a few specimens, particularly to see what the behavior is for earlier versions of PowerPoint that don't support those types. I'm inclined to think they probably include a vector-graphic image, like a Windows Meta File (WMF) that shows (but can't be edited) if you have a prior version. We'll need to figure out what python-pptx will do because it likely won't be generating a WMF since there's not going to be a library for that.

Then minimally there will be adding the new types to the enumeration and adding any new XML elements it introduces, but probably also a lot of more detailed work for each chart type to make it generate properly. Best to see about that once we know more.

perkes commented 4 years ago

All new charts seem to be mapped to http://schemas.microsoft.com/office/drawing/2014/chartex, viewable at: https://docs.microsoft.com/en-us/openspecs/office_standards/ms-odrawxml/e2723b0a-9120-42a5-bd11-c252ccb13c1e Your xsd directory was last updated 7 years ago, so it shouldn't be there.

scanny commented 4 years ago

Okay, that's good input for the spec. A couple next steps I can think of:

  1. How does one add one of these new-style charts from the MS-API for PowerPoint? I'm looking at the current Slide.AddChart2() documentation, and the XlChartType enumeration it uses to specify chart type doesn't seem to include the new types. Interesting though that the method name seems to have changed from AddChart to AddChart2.

  2. We'll need a specimen XML document of each chart type to be added, as simple (short) as possible, but not so short it leaves out important details. The first thing I'll be looking for is how much of the existing chart structure it preserves (series, plots, etc.) and how much it breaks. Might be best to start with just one of these and then build as we go. Also probably best for them to be "attached" files rather than pasted in here, otherwise scrolling gets to be tedious.

  3. Might also be nice to have a screen shot of a characteristic example of each. We could do those one at a time too, starting with whichever one is closest to your heart at the moment.

perkes commented 4 years ago

I think we should probably start with the waterfall chart, as there were several requests for it. Let me know if I can help. waterfall.zip

SandervandenOord commented 3 years ago

Any developments on this front still? There are 3 issues specifically mentioning waterfall charts. Do you need any help on this?

scanny commented 3 years ago

There haven't been any sponsors to come forward for this, so no, no developments. If you wanted to move things forward the first step is an analysis document, maybe something like this one: https://github.com/scanny/python-pptx/blob/master/docs/dev/analysis/cht-bubble-chart.rst. This is another recent one that gives an idea of the kinds of things that need to be discovered and recorded.

This document is really an enhancement proposal, like a PEP, and gives us a basis for having the design conversations and making the design decisions an implementation will realize.

You can submit the document as a PR and then we can discuss and refine it as part of the PR process. The document is separately committable/mergeable, even if the implementation hasn't started yet. In general I'd say make a quick sketch of it and submit it for early comments as to what else it needs to avoid rework as much as possible.