scanny / python-pptx

Create Open XML PowerPoint documents in Python
MIT License
2.37k stars 513 forks source link

bug: Autoshape name must be escaped against embedded double-quotes #758

Closed smoncktn closed 1 year ago

smoncktn commented 2 years ago

For some this crashes: for shape in slide.shapes:
print("id: %s, type: %s" % (shape.shape_id, shape.shape_type)) circleShape=slide.shapes.add_shape(MSO_SHAPE.NO_SYMBOL, Cm(10), Cm(10), Cm(1), Cm(1))

Calls to MSO_SHAPES above and below this shape (i.e. 18 and 20) work fine. Strange.

Output on my machine: File "C:\Users\monck\TrailExperiment\shapely.py", line 21, in circleShape=slide.shapes.add_shape(19, Cm(10), Cm(10), Cm(1), Cm(1)) File "C:\Users\monck\AppData\Local\Programs\Python\Python39\lib\site-packages\pptx\shapes\shapetree.py", line 345, in add_shape sp = self._add_sp(autoshape_type, left, top, width, height) File "C:\Users\monck\AppData\Local\Programs\Python\Python39\lib\site-packages\pptx\shapes\shapetree.py", line 448, in _add_sp sp = self._grpSp.addautoshape(id, name, autoshape_type.prst, x, y, cx, cy) File "C:\Users\monck\AppData\Local\Programs\Python\Python39\lib\site-packages\pptx\oxml\shapes\groupshape.py", line 42, in add_autoshape sp = CT_Shape.new_autoshapesp(id, name, prst, x, y, cx, cy) File "C:\Users\monck\AppData\Local\Programs\Python\Python39\lib\site-packages\pptx\oxml\shapes\autoshape.py", line 239, in new_autoshape_sp sp = parse_xml(xml) File "C:\Users\monck\AppData\Local\Programs\Python\Python39\lib\site-packages\pptx\oxml__init__.py", line 40, in parse_xml root_element = etree.fromstring(xml, oxml_parser) File "src\lxml\etree.pyx", line 3237, in lxml.etree.fromstring File "src\lxml\parser.pxi", line 1896, in lxml.etree._parseMemoryDocument File "src\lxml\parser.pxi", line 1777, in lxml.etree._parseDoc File "src\lxml\parser.pxi", line 1082, in lxml.etree._BaseParser._parseUnicodeDoc File "src\lxml\parser.pxi", line 615, in lxml.etree._ParserContext._handleParseResultDoc File "src\lxml\parser.pxi", line 725, in lxml.etree._handleParseResult File "src\lxml\parser.pxi", line 654, in lxml.etree._raiseParseError File "", line 3 lxml.etree.XMLSyntaxError: attributes construct error, line 3, column 28

scanny commented 2 years ago

Hmm, interesting. I believe the culprit is this definition having "embedded double-quotes" in the shape name '"No" Symbol': https://github.com/scanny/python-pptx/blob/master/pptx/enum/shapes.py#L504

The fix would either be to remove the double-quotes (which appear in the "official description") or add an XML xml.sax.saxutils.escape() around this value: https://github.com/scanny/python-pptx/blob/master/pptx/shapes/autoshape.py#L241

It's possible you could monkey patch this in like so:

from xml.sax.saxutils import escape
from pptx.shapes.autoshape import AutoShapeType

def get_basename(self):
    return escape(self._basename)

setattr(AutoShapeType, 'basename', property(get_basename))

Note to self: Consider whether the escaping is more appropriate in CT_Shape.new_autoshape_sp() since that is where the XML is actually formed.

scanny commented 1 year ago

Fixed in version 0.6.22 circa Aug 20, 2023.