scanny / python-pptx

Create Open XML PowerPoint documents in Python
MIT License
2.38k stars 514 forks source link

Add docs warning about images with an & character #718

Closed beyarkay closed 3 years ago

beyarkay commented 3 years ago

Images containing the ampersand & character result in the cryptic error message such as:

Traceback (most recent call last):
  File "myscript.py", line 1057, in <module>
    main()
  File "myscript.py", line 122, in main
    write_to_pptx(f"{SAVE_DIR}/Graphs-Complete")
  File "myscript.py", line 685, in write_to_pptx
    pic = slide.shapes.add_picture("img&.png", Cm(0), Cm(3), width=ppt.slide_width)
  File ".../python3.7/site-packages/pptx/shapes/shapetree.py", line 295, in add_picture
    pic = self._add_pic_from_image_part(image_part, rId, left, top, width, height)
  File ".../python3.7/site-packages/pptx/shapes/shapetree.py", line 399, in _add_pic_from_image_part
    pic = self._grpSp.add_pic(id_, name, desc, rId, x, y, scaled_cx, scaled_cy)
  File ".../python3.7/site-packages/pptx/oxml/shapes/groupshape.py", line 81, in add_pic
    pic = CT_Picture.new_pic(id_, name, desc, rId, x, y, cx, cy)
  File ".../python3.7/site-packages/pptx/oxml/shapes/picture.py", line 70, in new_pic
    pic = parse_xml(xml)
  File ".../python3.7/site-packages/pptx/oxml/__init__.py", line 40, in parse_xml
    root_element = etree.fromstring(xml, oxml_parser)
  File "src/lxml/etree.pyx", line 3237, in lxml.etree.fromstring
  File "src/lxml/parser.pxi", line 1896, in lxml.etree._parseMemoryDocument
  File "src/lxml/parser.pxi", line 1777, in lxml.etree._parseDoc
  File "src/lxml/parser.pxi", line 1082, in lxml.etree._BaseParser._parseUnicodeDoc
  File "src/lxml/parser.pxi", line 615, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 725, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 654, in lxml.etree._raiseParseError
  File "<string>", line 3
lxml.etree.XMLSyntaxError: xmlParseEntityRef: no name, line 3, column 49

This was referenced in issue 223: https://github.com/scanny/python-pptx/issues/223 from 26 Jul 2016. While this does not solve the issue, it makes temporary patch (removing ampersand characters) easier to find and easier to avoid by new users.

scanny commented 3 years ago

This is the offending line: https://github.com/scanny/python-pptx/blob/master/pptx/oxml/shapes/picture.py#L69

The desc parameter needs to be XML escaped to change the "&" into "&amp;".

Pretty sure this would do the trick:

from xml.sax.saxutils import escape

...
xml = cls._pic_tmpl() % (id_, name, escape(desc), rId, left, top, width, height)
...
scanny commented 3 years ago

Fixed in v0.6.20 due out here tonight or tomorrow.