scanny / python-pptx

Create Open XML PowerPoint documents in Python
MIT License
2.38k stars 514 forks source link

Handle a Shape having formula - AlternateContent #706

Open seyong-um opened 3 years ago

seyong-um commented 3 years ago

Symptom: Once a formula added into any type of shape, then Slide.shapes will not handle the Shape Reason: The shape is represented inside of AlternateContent element. Solution: Select first item of AlternateContent as a content.

stephen-farris-jhuapl-edu commented 2 years ago

Any chance someone could review this MR? I'm running into this bug myself, and would love to have a fix. The workaround, to not have any equations in my slide deck, isn't a great option. I don't know the code-base well enough to review this MR myself. In my case this causes tables containing equations to not be visible to my code, failing silently.

programmarchy commented 2 years ago

This PR bakes in the assumption that the user wants a specific "choice" (in the markup compatibility parlance).

Another approach would be to explicitly define this as a new shape node. The XML can be parsed like:

class CT_AlternateContent(BaseOxmlElement):
  veChoice = ZeroOrMore('ve:Choice')
  veFallback = OneAndOnlyOne('ve:Fallback')

register_element_cls("ve:AlternateContent", CT_AlternateContent)
register_element_cls("ve:Choice", CT_GroupShape)
register_element_cls("ve:Fallback", CT_GroupShape)

Then maybe produce a MarkupCompatibility shape from the BaseShapeFactory?

For the example with equations, the most likely scenario is you traverse slide.shapes and, when encountering a MarkupCompatibility instance, you select the first choice which would contain the "missing" table as well as the equation nodes.

programmarchy commented 2 years ago

@scanny Do you have any interest in supporting these "AlternateContent" elements? They are essentially conditional branches of XML that allow fallbacks for features that may not be backwards compatible with older versions of PowerPoint.

PowerPoint may place these branches near the root of the shape tree, instead of tightly bounding a feature. So if you have math formulas inside a table, then that table won't be enumerated as a shape because it will be nested in one of these branches.

For example, if you have a table containing "math stuff", needing >=2010, then the XML tree may look like:

<p:spTree>
  <mc:AlternateContent xmlns:mathStuff="http://schemas.microsoft.com/office/drawing/2010/main" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006">
    <mc:Choice Requires="mathStuff"> <!-- specifies what features are needed for compatibility -->
      <table>
        ...
        <!-- feature that not be compatible with every version -->
        <mathStuff:formula>
          ...
        </mathStuff:formula>
      </table>
    </mc:Choice>
    <mc:Fallback>
      <table>
        ...
        <!-- feature would be missing or substituted -->
      </table>
    </mc:Fallback>
</p:spTree>

If you're interested, guidance on implementation would be appreciated. There's an implementation in this PR, and I've also proposed an alternative. Thanks.

AM-ash-OR-AM-I commented 1 year ago

@seyong-um Hey! Your PR seemed to have solved most of the issues with parsing equations but I'm still having some issues could you please look at issue #892?