jgm / pandoc

Universal markup converter
https://pandoc.org
Other
33.92k stars 3.34k forks source link

Could not find shape for Powerpoint content #5402

Closed ghost closed 5 years ago

ghost commented 5 years ago

Summary

The PPTX renderer does not work with reference documents from Microsoft PowerPoint; it consistently returns the same error for all templates:

Could not find shape for Powerpoint content

Perversely, it does work for PowerPoint documents exported from Google Slides, albeit with odd artifacts in the results.

Details

Per the Pandoc documentation under the section for Powerpoint:

All templates included with a recent version of MS PowerPoint will fit these criteria. (You can click on Layout under the Home menu to check.)

Attached is a document saved, unedited, directly from a default PowerPoint template; the template does appear to follow the 4-layout pattern described in the Pandoc documentation. However, Pandoc generates the error for this template.

➜ pandoc -s test.md -o test.pptx --reference-doc Presentation3.pptx 
Could not find shape for Powerpoint content

Environment

pandoc 2.6
Compiled with pandoc-types 1.17.5.4, texmath 0.11.2, skylighting 0.7.5
...

Presentation3.pptx test.md.gz

ghost commented 5 years ago

After submitting this, I tried importing the template into Google Docs, opening it in Slides, and exporting it back to PPTX. Pandoc did not generate an error message on this, but the resulting deck were 6 slides (the correct number) of blank (white) slides. There was no content on any of the slides, and only the first appeared to have an assigned layout; the rest contained no objects.

jkr commented 5 years ago

There's something wonky about the shape names and IDs this template -- I have to figure out how PowerPoint itself knows where to put the content in this one, since it isn't clear from the internal structure. We should be able to handle it, since it's included in PowerPoint, but there's no guarantee that we'll be able to handle every template.

If you do need to use this, while we investigate this, you can always just export to a plain (untemplated) deck, and then change the presentation template to this from within PowerPoint. I tested that and it works.

But hopefully, we can figure out how to address this programatically.

jkr commented 5 years ago

The issue, for posterity:

  1. To apply a template, we need to get the shapes for title, content, etc.

  2. Sometimes, shapes in layouts have placeholder (ph) types that tell us what they are, as in this title shape:

    <p:cNvPr id="4" name="Title 3"/>
      <p:cNvSpPr>
        <a:spLocks noGrp="1"/>
       </p:cNvSpPr>
        <p:nvPr>
           <p:ph type="title"/>
         </p:nvPr>
    ...

    note the <p:ph type="title"/>

  3. Content types don't have this, but they do have placeholder indices, which in my experiements, seemed to always be 1 (or in the case of two-content) 1 and 2.

  4. This template, however, starts the content ph idx at 13.

  5. Its master slide, which does specify ph type="body" starts the content indexing at 1.

  6. I don't know how, when it applies the template inside the application, PowerPoint knows what content to put inside the main content box.

I have a sinking feeling that PowerPoint might just map things over in order when it applies a template. First shape to first shape, second shape to second shape. Which would be crazy, but in keeping. Until I act on that, though, I have to do some more experiments. So keeping open for now.

ghost commented 5 years ago

I've exported a couple of default templates from this version of PowerPoint, and Pandoc fails to interpret any of them; I don't think it's just that one template. Microsoft's probably changed how they're managing templates; they're probably ignoring metadata aspects of XML to preserve vendor lock-in.

My issue is that I need to match corporate templates that are full of fun little oddities, like creative bullet characters and specific colors. Pandoc doesn't like this template, led me on a journey to find any templates that Pandoc would consume, eventually led to trying several of the templates bundled with PowerPoint and discovering that Pandoc doesn't like any of the bundled templates.

Obviously, Pandoc understands its own template; I've had limited success importing layouts into the default template while preserving styling, but I'll keep poking along down that route.

jkr commented 5 years ago

Just to confirm -- you're using PowerPoint on Mac?

jkr commented 5 years ago

FWIW, I get good results on PowerPoint 2013 on Windows (via virtualbox on arch). Open a new blank document, apply a theme/template, and save it as either a pptx or potx. I should specify this in the docs.

We'll probably never be able to deal with "fun little oddities" -- but I'll make trying to deal with these bundled pptx templates a priority. Could you mail the potx files to me (directly, so we can't be accused of distribution), and I'll see what I can figure out.

[edited for clarity: I get good results with the bundled themes on PPT 2013 (Win) -- not with your file]

jkr commented 5 years ago

BTW, a reference doc that should work is in test/pptx/reference_depth.pptx. (It's the reference doc we test again, from Windows PPT 2013). If you don't have the pandoc source handy, you can download the file from here:

https://github.com/jgm/pandoc/blob/master/test/pptx/reference_depth.pptx

I assume that should work for you. If it doesn't, things are even weirder than I thought.

ghost commented 5 years ago

@jkr Yes, PowerPoint on the Mac, Office 265. Version 16.22 (190211). I wouldn't be surprised if, in the past 5 years, PowerPoint had introduced incompatible changes with their templates. As I mentioned before, the template that comes with Pandoc (via --print-default-data-file) does work.

Aside from changing Pandoc to make it compatible with more recent versions of PowerPoint, maybe the documentation should be changed to say that templates from PowerPoint 2013 are known to work, rather than saying that

All templates included with a recent version of MS PowerPoint will fit these criteria.

It would appear that recent versions of MS PowerPoint do not work, although it'd be good to get a second verification.

jkr commented 5 years ago

Pushed a change to the docs. If you get a chance, I would appreciate it if you could mail some templates along (address in the PowerPoint writer source code). It's all so poorly documented that whatever we have has to behave as the reference implementation. So the more we have, the more robust it will (hopefully) become.

jkr commented 5 years ago

@serussell I made a couple of big changes in pandoc's PowerPoint templating -- it should be much more robust now. Your attached files, for example, now produce output without warning or corruption, and, to my eyes look correct. If you're able to build from source, you can give it a try.

ghost commented 5 years ago

@jkr Thanks so much! I'm traveling at the moment, and I don't have a haskell build environment set up on this laptop, but I'll try to get this built and tested in the next couple of days.

Do you still want me to export templates for your test cases?

jkr commented 5 years ago

That would be great, if you get a chance.

ghost commented 5 years ago

Ok. I should have a chance to do both this weekend. Sorry for the delayed gratification of ticket closure.

retorquere commented 5 years ago

How can I modify an existing corporate powerpoint template so that it will be recognized by pandoc?

Neurrone commented 5 years ago

I just experienced this issue as well. I think we should at least update the users guide, which inaccurately states that this works with templates saved from recent versions of Powerpoint, since this is currently not the case, and it should recommend making edits from the built-in template instead.

jkr commented 5 years ago

@Neurrone : what version of pandoc are you using? The most recent version should currently work with those exported templates. If not, it is a bug, and I'll reopen this issue.

Neurrone commented 5 years ago

The latest, 2.7.3.

jkr commented 5 years ago

Could you please post the template that didn't work? What version of powerpoint was it saved from?

retorquere commented 5 years ago

I don't know what it was saved from, but our corporate template is available at https://company-122895.frontify.com/api/attachment/download/FBcAVUTBYwQUa6UUQtF03Eqt1qDCsN0zgI_LVSjKW85J1PXf4e7m3HRWI-OBE_UYgsjGK2uXc2yLCwDYxzAiMQ

jkr commented 5 years ago

@retroquere: If you look at the layouts available in this template, you'll see that it doesn't meet the requirements for layouts spelled out in the manual:

Templates included with Microsoft PowerPoint 2013 (either with .pptx or .potx extension) are known to work, as are most templates derived from these.

The specific requirement is that the template should begin with the following first four layouts:

  1. Title Slide
  2. Title and Content
  3. Section Header
  4. Two Content

There is no section header layout in the slide you provided/

jkr commented 5 years ago

@Neurrone : ping. I'd like to fix any issues along these lines, so if you could post the template that causes trouble, and tell me which version of PowerPoint you saved it from, I'd appreciate it.

Neurrone commented 5 years ago

@jkr I probably also didn't read the fine print carefully enough about the required layout order, which could have been the source of the problem. If I can reproduce this definitively even with the correct layout order, I'll definitely post here.

retorquere commented 5 years ago

@jkr I've added a section header layout in position 3 in the version at https://0x0.st/z4wm.potx but I'm still seeing the same error:

$ pandoc --to pptx --reference-doc breed_wit_met_dianummering.potx --output ScrumIntro.pptx ScrumIntro.md
Could not find shape for Powerpoint content
jkr commented 5 years ago

There are a few issues here. The section header is actually at a later position, though that order can be fixed in the master view. But also, significantly, this slide template uses different place-holder names for its content types (not quite sure why -- I've never seen them before). This is different from the ones in recent PowerPoint installs that I've seen. I'll look around and see if we should support the names it uses, or whether it's just idiosyncratic.

In any case, the best thing to do would be to start with a fresh template (the blank one in a stock PowerPoint would work, or the output of pandoc -o custom-reference.pptx --print-default-data-file reference.pptx). Then apply

We should make a special note about company/school templates, since they've often been handed down forever and are difficult to find compatibility with.

retorquere commented 5 years ago

using R officer the slides report as

1                   Titeldia A breed wit
2             Titel en Tekst A breed wit
3       Titel en halve tekst A breed wit
4        Titel en Afbeelding A breed wit
5     Dubbele Titel en Tekst A breed wit
6 Quote (Zwarte Achtergrond) A breed wit
7             Section Header A breed wit

I'm not sure whether that's the regular Dutch translation for PowerPoint but I'd put my money on them being idiosyncratic -- the marketing dept doesn't have to care about naming because people are expected to use these in PowerPoint only.

I can start from scratch but our marketing dept is strongly discouraging remixing of the templates/use of the imagery to make new ones -- if I can fix these templates in-place I'm hoping I can negotiate with them to keep the fixes in the official templates. Sorry for the extra hassle, it's just in larger companies, such things are really hard to get moving. I'm still not entirely clear on what pandoc is looking for in these slides -- names of the sheets in the master? Blocks inside those? If it's just names/IDs I should be able to modify the XML without affecting the layout, and that I could bring back to marketing.

retorquere commented 5 years ago

I've renamed and re-ordered the first four master slides in https://0x0.st/z4YR.potx but I'm still getting the same message.

Can someone point me in the direction of the source that parses the template? I could try figuring out what the code does/expects and repair the template that way.

jkr commented 5 years ago

@retorquere :

The name isn't important, just the order. You'll see them in the dropdown from the "Layer" menu.

The specific issue is the placeholder ("ph") element, under "sp" > "nvSpPr" > "nvPr" > "ph" in the XML hierarchy. Normally, body content does not have "type" attribute (it is default). This is the case with templates from supported PowerPoint releases, and derived templates.

Your template uses a "body" type for this attribute. That seems sensible, and we should be able to support it, but I need to do a bit more testing before I find out if it messes up any currently supported templates. So I'm working on this.

The code is here: https://github.com/jgm/pandoc/blob/master/src/Text/Pandoc/Writers/Powerpoint/Output.hs

jkr commented 5 years ago

FYI: Allowing "body" works, and seems to fix your problem. (If you wanted a fix before that's released, you can actually open up the zip file of the potx, open up the ppr/slideLayout/* files and remove the `type="body"``attributes.) I need to test it further to make sure that supporting this format doesn't interfere with more common, supported formats. Also, I need to figure out whether other templates use this -- we can't support every individual usage which doesn't fit the standard, but if it is in fairly common use, it makes sense.

BUT even with this fix, the title slide still isn't working, because it doesn't seem to be based off of a real title slide. If you look at the xmo of the formats offered by PowerPoint templates, you'll see that in the title slide, the various placeholders for the shapes have types "Title", "SubTitle," etc.. Yours have all type "body". This isn't a linguistic issue -- the title slide just wasn't based on a real title slide.

So this template will works for starting a new presentation by hand, which is I'm sure what it was iintended for, but not for templating an existing presentation, which is what Pandoc requires.

retorquere commented 5 years ago

That's perfectly fine -- I'm not at all saying pandoc should support this template. I'm looking to understand what pandoc expects so I can talk to our marketing dept so that they can start distributing pandoc-compatible templates.

So:

  1. for the interim, if I strip type="body" from the template, that will help, although it may only be needed temporarily
  2. I need to make sure the title slide sure the placeholder shapes have types name Title, SubTitle. I can make that happen

I reckon there's just a few things that a template needs to do to be pandoc-compatible, I just don't know what that is. Would it be possible to get a list of shape types that are supported/required for these 4 slides? To my shame I must admit I don't read Haskell fluently.

Is it possible to change these names using powerpoint? Otherwise I'll just unzip-edit-rezip, but if I can walk marketing through the required steps that'd be better. I don't see them editing XML.

jkr commented 5 years ago

A branch that fixes the specific issue on body slides is here:

https://github.com/jkr/pandoc/tree/support-body

Note that this does not (and cannot) fix the issue with the title slide.

jkr commented 5 years ago

Good idea (this could be a special section for advanced users in the manual).

Off the top of my head:

  1. The order of Layouts 1-4, as currently described. This means, technically, the layouts as defined in ppt/SlideLayouts/SlideLayout{1,2,3,4}.xml.

  2. The required shapes (defined by ph Type) in each of those layouts. I'd have to go back to the source I linked and consult, but it's essentially:

    a. SlideLayout1.xml (title): ctrTitle, subTitle, dt b. SlideLayout2.xml (title and content, ie, content slide): title, No Attribute (implies body content) c. SlideLayout3.xml (secition header): title OR ctrTitle d. SlideLayout4.xml (Two Content): title, No attributes for body content

I'd have to figure out the best ways to express that, but those are the expected shapes, and the ones supplied by the templates shipped with PowerPoint.

retorquere commented 5 years ago

I'll try to do a write-up of this. It'll help me too to get it sorted.

jkr commented 5 years ago

Take a look at the data in https://github.com/jgm/pandoc/tree/master/data/pptx as reference. Also, unzip and take a look at https://github.com/jgm/pandoc/blob/master/test/pptx/reference_depth.pptx , which is an MS tempate that we use for testing.

retorquere commented 5 years ago

When you say

SlideLayout2.xml (title and content, ie, content slide): title, No Attribute (implies body content)

does that mean <p:ph type="title"> and no other attributes? Because it will have the type attribute.

jkr commented 5 years ago

For this layout there are two different possible shapes we need, a slide title shape AND a content shape.

From SlideLayout2.xml:

  <p:cSld name="Title and Content">
    <p:spTree>
      ...
      <p:sp>
        <p:nvSpPr>
          <p:cNvPr id="2" name="Title 1"/>
          <p:cNvSpPr>
            <a:spLocks noGrp="1"/>
          </p:cNvSpPr>
          <p:nvPr>
            <p:ph type="title"/>
          </p:nvPr>
        </p:nvSpPr>
      ...
      </p:sp>
      <p:sp>
        <p:nvSpPr>
          <p:cNvPr id="3" name="Content Placeholder 2"/>
          <p:cNvSpPr>
            <a:spLocks noGrp="1"/>
          </p:cNvSpPr>
          <p:nvPr>
            <p:ph idx="1"/>
          </p:nvPr>
        </p:nvSpPr>
        <p:spPr/>
       ....
      </p:sp>

There are numerous shapes (<p:sp>) in the shapeTree (<p:spTree>). Note how the first one has a type on its <p:ph> element, and the second does not (just an index).

retorquere commented 5 years ago

And for the 4th slide then one p:ph with type="title", one p:ph with idx="1", one p:ph with idx="2", correct?

jkr commented 5 years ago

Sort of. The idx numbers aren't reliable across different releases of PowerPoint, so we do first untyped ph is Left, second untyped ph is Right (which seems to be how they do it internally).

jkr commented 5 years ago

Left and Right being somewhat abstract terms, since they could one on top of the other, or whatever.

retorquere commented 5 years ago

Writeup + python script which reports problems at https://gist.github.com/a29591543f8d0f6162a0bc48017d6864

jkr commented 5 years ago

This is great! A couple of observations:

  1. You refer to slideLayout4.xml as slideLayout2.xml in the README
  2. To really check for correctness, it should drill down to p:ph through the xml hierarchy, since that's what pandoc does. But it probably doesn't matter in practice (if you can open the reference doc in PowerPoint, we can assume it validates).

In addition to putting something more clear in the docs for advanced users like @retorquere, I'd like to work something like this into pandoc itself, for more meaningful error messages on failure.

But I can also see the benefit of having a stand-alone script like this so you can check before failing. @jgm: is there any policy about linking to external scripts in the manual?

retorquere commented 5 years ago

Updated in https://gist.github.com/9053b3dee7b2ce62382e005c73592391. I'd be more than happy to transfer ownership. I'd also be happy to throw it up on Heroku so people can check their templates online.

The script is a little awkward here and there because I can't use xpath, but it seemed worthwhile to me to make sure the script itself can run without external dependencies.

retorquere commented 4 years ago

https://rmarkdown-office-template.herokuapp.com/

retorquere commented 4 years ago

When I look at https://github.com/jgm/pandoc/blob/master/test/pptx/reference_depth.pptx, slideLayout3.xml does not have a p:ph element without a type attribute. What should I be looking for instead?

jascott1 commented 4 years ago

For others trying to make this work, I managed to get the pptx generated (with some issues) with the following steps (YMMV). Note this doesn't fix your source PPTX ,it just makes the pandoc reference and your final PPTX look similar to your original.

  1. open your intended reference PPTX and save the theme to a file.
  2. generate the reference ppt pandoc --print-default-data-file reference.pptx > ref.pptx
  3. open the pandoc reference PPTX you just created and load the theme you saved in step 1.
  4. save the pandoc reference PPTX
  5. unzip the pandoc reference PPTX into a temp directory
  6. cd temp/ppt/slideLayouts
  7. identify the titles of each slide with grep -o '.\{0,40\}p:cSld.\{0,40\}' slideLayout*.xml
  8. identify "Title Slide" and copy it to slideLayout1.xml
  9. identify "Title and Content" slide and copy it to slideLayout2.xml
  10. identify "Section Header" slide and copy it to slideLayout3.xml
  11. identify "Two Content" slide and copy it to slideLayout4.xml
  12. move up to temp dir and re-zip the content with zip -r fixed.pptx *
  13. cross fingers and/or other appendages
  14. generate your PPTX with pandoc my.md -o my.pptx --reference-doc temp/fixed.pptx
  15. open my.pptx with Powerpoint

The script posted in https://github.com/jgm/pandoc/issues/5402#issuecomment-526940748 above was helpful so I recommend running that to get started with your original reference PPTX. (thanks @retorquere !)

LunkRat commented 1 year ago

I was successful blending the default pandoc reference .pptx with my institution's standard-issue branded .ptox template. Sharing my steps here in case it's helpful to someone. The general approach is to open a new Power-Point based on the .ptox branded template, and simultaneously open the pandoc default-data-file generated by pandoc --print-default-data-file reference.pptx > default.pptx. I will refer to these as branded.pptx and default.pptx respectively. Once they are both open, you simply delete all slides and masters from branded.pptx and then drag over all slides and masters from default.pptx into branded.pptx. Then you can use branded.pptx as your reference doc that you pass in to pandoc when you build your slides.

  1. run pandoc --print-default-data-file reference.pptx > default.pptx
  2. open default.pptx in PowerPoint
  3. open your branded.pptx file provided by your institution
  4. delete all slides in branded.pptx
  5. open the master slide interface for both PowerPoint docs (View > Slide Master)
  6. rename the first master slide in branded.pptx to "[slide name]_old"
  7. drag the first master slide over from default.pptx into branded.pptx.
  8. delete "[slide name]_old" from branded.pptx
  9. repeat the process for all master slides that appear in default.pptx
  10. delete any masters from branded.pptx that do not have a corresponding name in default.pptx
  11. exit the Master interface in both documents, then drag over all non-master slides from default.pptx into branded.pptx
  12. save branded.pptx and use it as your reference-doc when you output your pandoc slideshows!

This process gave me a working branded.pptx reference doc that shows all elements that pandoc normally outputs to default.pptx

npiper commented 1 year ago

For the MacOS Powerpoint option it is worth trying saving the reference presentation in a different format of PPTX:

When I didn't do this for a customised presentation as straight PPTX I got warnings and a prompt for Mac Powerpoint to try repair the files, but when I saved the reference as:

'Strict Open XML Presentation (.pptx)'

and tried with this as the --reference-doc I didn't get the same warnings

https://support.microsoft.com/en-us/office/file-formats-that-are-supported-in-powerpoint-252c6fa0-a4bc-41be-ac82-b77c9773f9dc

Strict Open XML Presentation | .pptx | A presentation in the ISO strict version of the PowerPoint Presentation file format.

xjantoth commented 4 months ago

I am experiencing the same issue with pandoc at MAC OS. Over past few days, I tried to

pandoc -o params-tpl.pptx  --print-default-data-file tpl.pptx
pandoc  base.md  -o pp.pptx --reference-doc params-tpl.pptx

I have tries several versions on MAC os and at Ubuntu without any success. File tpl.pptx was created by me, and it has:

Any ideas?