python-openxml / python-docx

Create and modify Word documents with Python
MIT License
4.56k stars 1.12k forks source link

Feature - Edit Chart Data #1141

Open scottzach1 opened 2 years ago

scottzach1 commented 2 years ago

Edit Chart Data

It would be incredibly valuable to edit chart data within a Word Document.

Research

This has been raised (and was closed) via #444 stating that the user should post to StackOverflow for assistance.

However, investigating Stack Overflow: [🏷️python-docx] edit chart shows that nobody has been able to solve this problem with the only suggestion being to use python-pptx.

It would be nice to have similar functionality within python-docx.

Documentaiton

From my interpretation of Shapes (in general) — python-docx 0.8.11 documentation and Inline shape — python-docx 0.8.11 documentation this functionality appears to be possible but non-trivial and unimplemented.

Apologies if I have missed anything!

eldamir commented 1 year ago

This would be incredibly useful. Actually, my main requirement from a library like this. My journey so far can be summed up to:

  1. Just generate some documents nice and easy with this nice library
  2. No detailed support for charts. Meh, I'll just generate images with matplotlib. Works ok, but too bad that they cannot be edited in Word, since they are images, not XML chart figures
  3. Right, so I can do raw OpenXML stuff to get what I need... But that is insanely complex. Would take a pretty deep understanding of the underlying model to avoid making corrupt documents.
  4. New idea: create templates in word with the charts I need, and then simply load the document and edit the underlying datapoints with python-docx..... But I can't

I looked into other solutions from the C#/Dotnet world. They have a similar story. OpenXMLSdk2 is available, but too low-level implementation to be practical. Several libraries are built on top to try to make it simpler. Most of them support only Word xor PowerPoint xor Excel. Some support all, and those are paid, commercial libraries. Out of 4 of those commercial ones, only 1 of them - the most expensive one - promises an API for editing Chart data 😦

It seems to be a very tricky topic.

I'm guessing that the maintainers of this repo has some deep knowledge of OpenXML stuff. It would be awesome if they could chime in and help us find a good direction for this 😉

I'd love to contribute a solution, but I don't even know where to start 🤣

eldamir commented 1 year ago

Currently trying some funky work-arounds for dealing with updating of charts. It it works, I'll turn it into a blog post. Maybe it'll spark some interest here.

I'll be back with an update if I'm successful 😉

marcus-hao commented 1 year ago

+1 I'm also trying to get around having editable charts in the document.

Like you mentioned, there seems to be a workaround using python-pptx. See #392 .

eldamir commented 1 year ago

Thank you @marcus-hao.

What I am currently hacking together in my own specific case is:

So far, it seems to work okay, but I need to explore some edge cases. It is not a reasonable thing to add to this library, but it may be a somewhat reasonable workaround for some 😉

marcus-hao commented 1 year ago

@eldamir That's what I thought about too, changing the underlying XML. But I think this method is too complicated for my use case. :(

lucasdepetrisd commented 1 year ago
  • Create a Word file template with a chart inside it, then use Python to:
  • Unzip the word file
  • Edit the embedded xlsx file using openpyxl
  • Edit the chart.xml using standard library for XML stuff
  • Zip it back up
  • Open the file and hope it works 🤣

@eldamir I'm interested in your workaround. I'd like to know more about it, also regarding those edge cases. If you could share your approach or any insights, that would be great. Thanks a lot!

eldamir commented 1 year ago
  • Create a Word file template with a chart inside it, then use Python to:
  • Unzip the word file
  • Edit the embedded xlsx file using openpyxl
  • Edit the chart.xml using standard library for XML stuff
  • Zip it back up
  • Open the file and hope it works 🤣

@eldamir I'm interested in your workaround. I'd like to know more about it, also regarding those edge cases. If you could share your approach or any insights, that would be great. Thanks a lot!

Alright friend, I tried to write up the whole shebang, just for you, @lucasdepetrisd.

Enjoy: https://botched-deployments.com/posts/python-docx-charts

scanny commented 1 year ago

Nice post @eldamir! :)

One other possible appoach is to use python-pptx to generate the charts and then "transplant" them into Word. The chart functionality in python-pptx is pretty well advanced; it turns out to be highly desired by analytics projects who funded most of that work.

I'm sure there are differences, especially at the top level with how they are embedded into a document instead of a slide, but I expect much of the DrawingML aspects are identical, which Microsoft would have done on purpose of course because DrawingML objects can appear in all of Excel, Word, and PowerPoint.

Just an idea, but I've seen folks succeed at that approach too. :)

eldamir commented 1 year ago

Thank you @scanny. For my case, I need the advantage of being able to prepare the word template by hand, but I'll definitely dig into python-pptx to see if I can maybe be a bit smarter about how I update the charts.

Messing with the XML by hand works fine, but is pretty low-level, and it is different for every chart type... Maybe python-pptx would provide a simpler interface for it ❤️

mengdeer589 commented 4 months ago

请在此处获取monkey.py文件monkey.txt 我通过整合KehaoWu的代码,创建了monkey.py这个文件,使用monkey patching的办法为docx增加了添加可编辑图表的功能。 使用示例参考 1,需要在代码中引入该文件import monkey,, 2,需修改docx/oxml/shape.py,65行,增加cChart = ZeroOrOne('c:chart'),即可成功使用。 测试版本: python 3.11 python-docx 1.1.2 python-pptx 0.6.23

示例代码: `from docx import Document from pptx.chart.data import CategoryChartData from pptx.enum.chart import XL_CHART_TYPE from pptx.util import Cm

import monkey

document = Document() chart_data = CategoryChartData() chart_data.categories = ["apple", "banana", "grape"] chart_data.add_series("Series 1", (19, 30, 7)) chart = document.add_chart( XL_CHART_TYPE.COLUMN_CLUSTERED, 0, 0, Cm(15.2), Cm(11.4), chart_data )

document.save("test.docx") ` 不过未经严格测试,请谨慎使用