scanny / python-pptx

Create Open XML PowerPoint documents in Python
MIT License
2.37k stars 513 forks source link

feature: Slide.duplicate() #132

Open AlexMooney opened 9 years ago

AlexMooney commented 9 years ago

In order to create a presentation by populating template slides with dynamic data As a developer using python-pptx I need the ability to clone a slide

API suggestion:

cloned_slide = prs.slides.clone_slide(original_slide)

The cloned slide would be appended to the end of the presentation and would be functionally equivalent to copying and pasting a slide using the PPT GUI.

scanny commented 9 years ago

Hi Alex, can you describe your use case for us please?

AlexMooney commented 9 years ago

The goal is to automatically generate a couple dozen presentations when the data they present is updated. Every couple of weeks, we get updates and I want to save everyone from having to enter the new stuff in PPT.

I have made a template deck with a few placeholder type slides (there happens to be 3 styles of slide per presentation). My program reads text files to inject arbitrary data into the tables, text boxes, and charts on the placeholder slides. The next feature I need to figure out is that sometimes a deck will have 4 slides of type B instead of the usual 1. I'd like to e.g. duplicate the original slide B three times then inject data into each of those.

One of the design constraints is that the users of this program won't be able to maintain programmatically generated slides, so I'm modifying a template pptx file that they'll be able to make changes to as their needs evolve, rather then using things like add_slide, add_chart, and so forth from within Python. The only way I know of to do this now is to make the template have many redundant placeholder slides and have the end user manually delete the unused ones.

scanny commented 9 years ago

Can you say a little more about the design constraint you mention? I'm not clear whether you're saying your users won't be able to modify code (understandable :), or whether you don't want them to be able to modify the slides after they're generated, or perhaps something else entirely.

If you can give an idea of the motivations behind that constraint I think that will make it much clearer.

ghost commented 9 years ago

This would be very useful! I need the same option+ some more for inserting an empty slide! as I'm trying to generate a pptx automatically which grabs values from user report and converts it from database to pptx presentation. I've made my own function for inserting/copying a slide, but still i'm waiting for this option since python-pptx v0.3.2, please make this featured!

AlexMooney commented 9 years ago

They won't be able to modify the Python code. They will definately have to modify the slides by hand after it's populated with the data but before it's shown to managers. :)

The workaround I've since arrived upon is to have them manually make the pptx decks by hand and then my script goes back in and fills in all the data that it knows how to handle. Now, if they want to have a 4 page ppt with an extra chart on page 3, they make that in PowerPoint, but leave the data out, and my stuff fills it in. It would have been nicer to skip the manual portion, but my use case is mostly covered.

scanny commented 9 years ago

Ah, ok, I think I see what you're trying to do now.

It sounds like you're essentially having the users maintain a "template" presentation which your code then uses as a base into which to insert the latest content.

It's an interesting approach. The main benefit as I see it being the end-users can use PowerPoint to tweak the template slides themselves.

We've had a bunch of folks address similar challenges by just building up the slide in its entirety using python-pptx code. This has the advantage that you don't have to get rid of empty slides and so forth, but it doesn't allow end-users to tweak the slide template. They'd have to come to you for code every time something formatting or ancillary copy needed to change.

I'll leave this one open as a possible future feature. It turns out to be trickier than one might expect to implement in the general case because there are quite a number of possible connections between a slide and other parts of the presentation. But we'll give it some more noodling :)

children1987 commented 9 years ago

I just have the same usecase. Expecting for your APIs. :)

robintw commented 9 years ago

I'm interested in doing the exact same thing as @AlexMooney.

@scanny: you said that it was difficult to implement in the general case, as there may be complicated linkages between slides. The presentations I'm working with won't have those complicated connections - would you be able to give any hints as to the best way to implement a duplicate() function that works in a limited case of simple presentations? At the moment I'm trying to loop through shapes and add them, but that requires a lot of logic to deal with shape types, copying subobjects etc

karlschiffmann commented 8 years ago

I am also interested in doing the exact same thing as @AlexMooney.

My use case is that the client wants to be able to supply any one from a structurally equivalent set of 2-slide templates, and have python code that will create new slides populated from dynamic test data. So we would like to duplicate from one of the template slides, and insert at the end of the deck. Upon saving the presentation, the 2 template slides would be stripped out.

Does anyone know if these features have already been implemented in python-pptx?

robintw commented 8 years ago

I don't know of it being implemented in python-pptx, but I have a pretty-awful implementation that works for my use case - but I should warn you that it may well not work for some of your situations, and is likely to be very buggy!

The code is below, and should be reasonably self-explanatory - but I warn you, it may well fail (and definitely only works for duplicating slides within presentations - for copying between presentations you open up a whole other can of worms)

def _get_blank_slide_layout(pres):
    layout_items_count = [len(layout.placeholders) for layout in pres.slide_layouts]
    min_items = min(layout_items_count)
    blank_layout_id = layout_items_count.index(min_items)
    return pres.slide_layouts[blank_layout_id]

def duplicate_slide(pres, index):
    """Duplicate the slide with the given index in pres.

    Adds slide to the end of the presentation"""
    source = pres.slides[index]

    blank_slide_layout = _get_blank_slide_layout(pres)
    dest = pres.slides.add_slide(blank_slide_layout)

    for shp in source.shapes:
        el = shp.element
        newel = copy.deepcopy(el)
        dest.shapes._spTree.insert_element_before(newel, 'p:extLst')

    for key, value in six.iteritems(source.rels):
        # Make sure we don't copy a notesSlide relation as that won't exist
        if not "notesSlide" in value.reltype:
            dest.rels.add_relationship(value.reltype, value._target, value.rId)

    return dest
karlschiffmann commented 8 years ago

I very much appreciate that! Will try it out... trying to decide whether to use python-pptx or just via MSPPT.py. Thanks again.

On Tue, Oct 13, 2015 at 2:10 PM, Robin Wilson notifications@github.com wrote:

I don't know of it being implemented in python-pptx, but I have a pretty-awful implementation that works for my use case - but I should warn you that it may well not work for some of your situations, and is likely to be very buggy!

The code is below, and should be reasonably self-explanatory - but I warn you, it may well fail (and definitely only works for duplicating slides within presentations - for copying between presentations you open up a whole other can of worms)

def _get_blank_slide_layout(pres): layout_items_count = [len(layout.placeholders) for layout in pres.slide_layouts] min_items = min(layout_items_count) blank_layout_id = layout_items_count.index(min_items) return pres.slide_layouts[blank_layout_id] def duplicate_slide(pres, index): """Duplicate the slide with the given index in pres. Adds slide to the end of the presentation""" source = pres.slides[index]

blank_slide_layout = _get_blank_slide_layout(pres)
dest = pres.slides.add_slide(blank_slide_layout)

for shp in source.shapes:
    el = shp.element
    newel = copy.deepcopy(el)
    dest.shapes._spTree.insert_element_before(newel, 'p:extLst')

for key, value in six.iteritems(source.rels):
    # Make sure we don't copy a notesSlide relation as that won't exist
    if not "notesSlide" in value.reltype:
        dest.rels.add_relationship(value.reltype, value._target, value.rId)

return dest

— Reply to this email directly or view it on GitHub https://github.com/scanny/python-pptx/issues/132#issuecomment-147854893.

mtbdeano commented 8 years ago

duplicating slides within presentations would be a great feature. the above snippet works, but doesn't seem to copy the table cells correctly (it replicated the tables on the initial slide, but i can't add paragraphs to them). Maybe there is something else going on there? not deep copying the text elements?

robintw commented 8 years ago

Ah yes, I've never tried it with slides containing tables, so it probably doesn't work properly for those. I'm afraid I haven't got time to investigate the problem with tables, but if you do manage to fix it then let me know.

mtbdeano commented 8 years ago

actually, that code works fine with tables! the error was on my side, needed to make sure it was a deepcopy (i was using a weird library for that), all works great, you should add the duplicate slide method to the slide object, with the caveat it only works within a presentation.

robintw commented 8 years ago

@scanny: Would you be interested in this being added as a method to the slide object?

karlschiffmann commented 8 years ago

Yes, thank you.

On Fri, Oct 16, 2015 at 12:50 PM, Robin Wilson notifications@github.com wrote:

@scanny https://github.com/scanny: Would you be interested in this being added as a method to the slide object?

— Reply to this email directly or view it on GitHub https://github.com/scanny/python-pptx/issues/132#issuecomment-148818115.

scanny commented 8 years ago

@robintw: I would, of course :) Probably best if you start with an analysis document like the ones you find here so we can think through the required scope. There's not really a place for methods that only work for certain cases, so we'd need to work out what the scope of the general case would be and account for the various bits. If I recall correctly, the tricky bit on this one is to make sure relationships to external items like images and hyperlinks are properly taken care of.

Then of course you would need to provide the tests. I think that bit is the most rewarding, in the sense it makes you a better programmer, but seems to be beyond the abilities of most contributors.

Let me know if you're still keen and we can get started.

robintw commented 8 years ago

Unfortunately I don't think I'll have the time to engage with this project to that extent. I'm significantly involved in a number of other open-source projects, while also working full-time - and I just can't commit to do this work properly.

I'm more than happy for anyone else who has time to take the code that I've posted in this issue and integrate it with python-pptx, or just use it themselves.

karlschiffmann commented 8 years ago

Sorry, I am not in a position to work on this right now either...

On Sat, Oct 17, 2015 at 5:50 AM, Robin Wilson notifications@github.com wrote:

Unfortunately I don't think I'll have the time to engage with this project to that extent. I'm significantly involved in a number of other open-source projects, while also working full-time - and I just can't commit to do this work properly.

I'm more than happy for anyone else who has time to take the code that I've posted in this issue and integrate it with python-pptx, or just use it themselves.

— Reply to this email directly or view it on GitHub https://github.com/scanny/python-pptx/issues/132#issuecomment-148914458.

children1987 commented 8 years ago

Thank you all for your fantastic work! I'd like to try to fix this. But I haven't contribute before, so I have no idea of weather I can do it well. What's more, my mother language is Chinese, so, if I ask some silly questions with my poor English for help, could you forgive me? @robintw @scanny

scanny commented 8 years ago

@children1987: You're entirely welcome to give it a try :) We'll have to see what you can come up with to get an idea how far away you'd be from getting a commit.

You'd need to be able to explain what you're doing and also write the tests. Those are the harder parts, so most people just write the code and don't bother with those bits; but they are what makes the library robust, so we can't accept a pull request without them.

Your English seems good enough so far. I'm sure we can manage to fix up the grammar and so on if your analysis is sound.

zhong2000 commented 8 years ago

@robintw thank you With slight modification of code, it is able to copy slide from template to new ppt. It is a great improvement to me. My proj is ppt report auto generation, I made many ppt template for various requirement before . now I can summary identical format into one ppt , then do the iteration of slide copy and content substitution. known bug: bg and some format will be lost. Although it is not critical for me, hope you can help .

    def _get_blank_slide_layout(pres):
         layout_items_count = [len(layout.placeholders) for layout in pres.slide_layouts]
         min_items = min(layout_items_count)
         blank_layout_id = layout_items_count.index(min_items)
         return pres.slide_layouts[blank_layout_id]

    def copy_slide(pres,pres1,index):
         source = pres.slides[index]

         blank_slide_layout = _get_blank_slide_layout(pres)
         dest = pres1.slides.add_slide(blank_slide_layout)

         for shp in source.shapes:
              el = shp.element
              newel = copy.deepcopy(el)
              dest.shapes._spTree.insert_element_before(newel, 'p:extLst')

              for key, value in six.iteritems(source.rels):
                         # Make sure we don't copy a notesSlide relation as that won't exist
                       if not "notesSlide" in value.reltype:
                               dest.rels.add_relationship(value.reltype, value._target, value.rId)

              return dest
hariedo commented 7 years ago

The code samples above for duplication of a slide use a variable or module called 'six' for function 'iteritems' which is not defined or clear here. Can you expand on your imports or refactor to use standard iteritems tools?

I want to copy a complex template slide 50 times, performing text substitution on each duplicate per some other data, then delete the template slide.

robintw commented 7 years ago

@hariedo six is a module that provides some compatibility helpers to make code work on Python 2 and 3. The documentation for the six.iteritems method (available at https://pythonhosted.org/six/#six.iteritems) states:

"Returns an iterator over dictionary‘s items. This replaces dictionary.iteritems() on Python 2 and dictionary.items() on Python 3."

So you should be able to replace it with whichever of these applies to your Python version.

hariedo commented 7 years ago

@robintw, thanks for that. My example ppt files don't seem to have any data that would cause source.rels or dest.rels to exist. I get 99% of what I want with that chunk of code removed, but wonder what I will be missing if I leave it out.

scanny commented 7 years ago

Images would be the most likely. Hyperlinks are also "relationship" objects, along with charts, smart art, and media (video, audio). There are a couple others that are more obscure.

cschrader commented 7 years ago

@robintw @zhong2000 Thanks for the code. I am trying to apply it to charts and also getting the AttributeError: 'Slide' object has no attribute 'rels' when accessing source.rels.

Removing the code doesn't copy charts correctly.

I guess you are opening the presentation differently than pres = Presentation(file_name)?

scanny commented 7 years ago

The internals changed a while back to extract a new SlidePart class from the Slide class. The .rels attribute moved over with the SlidePart object. The SlidePart object for a slide is accessed using the .part property on the Slide object.

So where you previously would have slide.rels you would now need slide.part.rels. I believe in the code above the change would be source.rels -> source.part.rels and dest.rels -> dest.part.rels.

Quanjiang commented 7 years ago

Here is my use case: generate product report by ppt every week, each week product number was different. Must have the way to auto create same page like copy existed page for next product.

Situation: Use ppt Slide mastering save your time on search the API from python-pptx.

  1. all of below's way can't work on python3.5 with python-pptx 0.6.6. (also can check here https://github.com/scanny/python-pptx/issues/238)
  2. Make Slide mastering in ppt . then you can easy use pre.slides.add_slide(pre.slide_layout[index]) to feed your reqeust.

Hope this item will save people time.


No The slide mastering didn't work. Those chat put into mastering will be background not allow for edit. I try to find other way.

Bretto9 commented 7 years ago

Any ideas on why this duplicate method corrupts the output (Python 2.7, python-pptx 0.6.6)? I just have some tables and charts, I have tried before and after filling them, but corrupts the file just the same.

I wouldn't mind any tool to check what's repairing Powerpoint so I can give more information.

Thanks for your work!

biggihs commented 6 years ago

I hope you guys can help me out, or point me in the right direction, with my duplication problem. I've been trying to debug it for a few days but I can't seem to figure it out.

I'm trying to duplicate a sheet using the same method @robintw used but like @Bretto9 I always get a corrupted file IF the sheet has a chart. I've been examining in the output file the difference between the sheet I'm trying to copy vs the new sheet and the new sheet seems to be missing a few things during the copy method.

The only difference between the original sheet and the copy are:

I have tried (using vim) to manually add the missing elements in my output file, but still Powerpoint crashes.

This is probably because of relationship objects because If I skip copying the chart, then the file loads fine.

Could this be because the chartData relations in both the origin and the copy point to the same location? /ppt/charts/charts.xml

Thanks in advance, all clues or ideas are greatly appreciated.

scanny commented 6 years ago

What is the XML of the chart shape? Can you post it up? I believe it's a p:graphicFrame element. Better post your exact code along with it. "Same as X" code requires scrolling and usually reveals later "Oh, except for that change I made!". So in the interest of saving time :)

I'm not at all sure you can refer to the same chart from two separate slides. This implies that you would need to duplicate the chart part and probably also the Excel spreadsheet it embeds.

I think the test for this would be to manually edit a .pptx that contains two slides, each with the same chart, to have them both point to the same one, then see if that loads:

  1. Start with a presentation with a single slide containing only a chart.

  2. Duplicate the slide and save the presentation.

  3. Extract the presentation with opc-diag:

    opc extract my-test-deck.pptx my-test-deck
  4. Change the relationship of the second slide to point to the first chart.

  5. Delete the second chart (/ppt/charts/chart2.xml probably). Probably should delete the second embedded Xlsx file as well.

  6. Repackage with opc-diag

    opc repackage my-test-deck my-hacked-deck.pptx
  7. See if it opens.

Re the XML particulars:

biggihs commented 6 years ago

The XML of the chart shape is this :

<p:graphicFrame>
  <p:nvGraphicFramePr>
    <p:cNvPr id="4" name="Content Placeholder 3"/>
    <p:cNvGraphicFramePr>
      <a:graphicFrameLocks noGrp="1"/>
    </p:cNvGraphicFramePr>
    <p:nvPr>
      <p:ph idx="1"/>
      <p:extLst>
        <p:ext uri="{D42A27DB-BD31-4B8C-83A1-F6EECF244321}">
          <p14:modId xmlns:p14="http://schemas.microsoft.com/office/powerpoint/2010/main" val="124628090"/>
        </p:ext>
      </p:extLst>
    </p:nvPr>
  </p:nvGraphicFramePr>
  <p:xfrm>
    <a:off x="681038" y="2336800"/>
    <a:ext cx="9613900" cy="3598863"/>
  </p:xfrm>
  <a:graphic>
    <a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/chart">
      <c:chart xmlns:c="http://schemas.openxmlformats.org/drawingml/2006/chart" r:id="rId2"/>
    </a:graphicData>
  </a:graphic>
</p:graphicFrame>

Here is a link to a diff between slide1.xml and slide2.xml where you can see what's different between the to charts. https://www.diffchecker.com/emfbI6Im

There is no difference between slide1.xml.rel and slide2.xml.rel. I think this is the source of my error.

I have confirmed that it's not possible for two sheets to reference the same chart, using the steps that you provided. The file doesn't open. However, when I manually change the relationship I get a different error message from Powerpoint : "The file may be damaged, or it may have been created in a pre-release version of PowerPoint.". But when I use my regular method, Powerpoint offers to repair the file and then it removes the chart form the sheet.

This is the code I use to copy the slide:

def _get_blank_slide_layout(pres):
    layout_items_count = [len(layout.placeholders)
                          for layout in pres.slide_layouts]
    min_items = min(layout_items_count)
    blank_layout_id = layout_items_count.index(min_items)
    return pres.slide_layouts[blank_layout_id]

def duplicate_slide(pres, index):
    """Duplicate the slide with the given index in pres.

    Adds slide to the end of the presentation"""
    source = pres.slides[index]
    blank_slide_layout = _get_blank_slide_layout(pres)
    dest = pres.slides.add_slide(blank_slide_layout)

    for shape in source.shapes:
        newel = copy.deepcopy(shape.element)
        dest.shapes._spTree.insert_element_before(newel, 'p:extLst')

    for key, value in six.iteritems(source.part.rels):
        # Make sure we don't copy a notesSlide relation as that won't exist
        if "notesSlide" not in value.reltype:
            dest.part.rels.add_relationship(value.reltype,
                                            value._target,
                                            value.rId)

prs = Presentation("input.pptx")
duplicate_slide(prs, 0)  # duplicate the first slide
prs.save('output.pptx')
scanny commented 6 years ago

Okay, a couple things for a start:

  1. In your XML, the chart is "lodged" in a content placeholder (see lines 3 and 8). This is the kind that can accept text, a table, a chart, or a picture, maybe others. I don't suppose it matters much what type it is, but the fact that it's a placeholder opens up additional failure modes. In particular, I expect the placeholder p:ph idx= value needs to be unique.

    I would take that out of the equation for debugging purposes by making the "source" chart simply a chart object "placed" on the slide, rather than "inserted" into a placeholder. That should simplify the p:graphicFrame XML a little and remove at least one possible failure mode to focus the diagnosis.

  2. I'm not getting exactly what you mean by "when I manually change the relationship...". On this sort of thing, I find it useful to recount the steps and the outcome. Like:

    • I run my code to copy the slide
    • I extract the package using opc-diag, and add the relationship ... to slide1.xml.rels
    • ... etc.
    • When I open the package I get the message: "The file may be damaged ..."
  3. I'd be strongly inclined at this point to see if duplicating the chart and Excel objects makes a working deck. For a start, I'd just copy chart1.xml to chart2.xml and embeddedExcelWhatever1 to ...2 in the extracted package.

    Then you'll need to edit the chart2.xml, I think the only bit is at the bottom of that file where it has something like external data.

    And you'll need to patch the relationship files. I think the slide relates to the chart and the chart relates to the embedded Xlsx. I don't think there are any other relationships.

    An easy way to start might be to duplicate the chart slide using PowerPoint and use that file (extracted) as a model and perhaps a place to pull items from.

biggihs commented 6 years ago

I'm sorry for not being clear and thank you so much for helping me out. I shall try to be more explicit from now on.

  1. I have tried to make the p:ph idx= values unique. I did that by:
    • Using opc to extract the file
    • Incrementing the values in the test1/ppt/slides/slide2.xml using vim
    • Using opc to repackage the file

I don't know how I should make the "source" chart simply a chart object placed on the slide rather than inserted. Do you mean I should open the input pptx.file using Powerpoint and doing it there? I just clicked the "navbar" in Powerpoint and chose to "insert" a chart using Powerpoint. Is there another/better way to add "place" charts in the pptx file?

  1. When I said "I manually change the relationship", these are the steps that I meant:

    • Open Powerpoint
    • Create a new Powerpoint file (test1.pptx) with a single slide, with a single chart
    • Right click and duplicate the slide (I now have two slides that look the same)
    • Close Powerpoint
    • Extract test1.pptx opc extract test1.ppt test1
    • Open test1/ppt/slides/_rels/slide2.xml.rels using vim
    • Change Target="../charts/chart2.xml" toTarget="../charts/chart1.xml"
    • Delete the test1/ppt/charts/chart2.xml.rels (tried with and without this step)
    • Repackage test1.pptx using opc repackage test1 test1.pptx
    • Try to reopen test1.pptx but it get the message "The file may be damaged, or it may have been created in a pre-release version of PowerPoint."
  2. I'm going to try to do this, will update when finished.

biggihs commented 6 years ago

I've been trying to copy a slide by doing these steps: The file doesn't get corrupted but the slide doesn't appear in the file.

I'm going to rest my eyes for a bit. I will continue in the morning.

scanny commented 6 years ago

In presentation.xml there is a list of slides. The element is something like p:sldLst. If the slide doesn't appear there it won't show up in the deck.

scanny commented 6 years ago

Regarding the p:ph idx= values, I remember now that uniqueness is not the important factor, rather that they refer to a placeholder in the slide layout that has that matching idx value (which serves as the "key" for placeholders). They of course need to be unique within a particular slide layout, and will then be unique within a particular slide, but their matching is the most important bit.

Another good reason, I think, to defer using a placeholder until you get it working with a "placed" chart.

To place a chart as a "non-placeholder" shape, simply start with a slide having no placeholder (I'd go with a blank layout for this for the moment) and choose Insert > Chart from the menu, or perhaps from the Ribbon, depending on your PowerPoint version. This puts a chart on the slide as its own shape, without reference to a placeholder.

Regarding "manually changing the relationship", if you're going to get rid of a chart, you'll want to get rid of all of it, to make sure PowerPoint isn't choking on some other anomaly. For a chart, this would include changing the slide.rels relationship, as you did, then deleting chart2.xml, chart2.xml.rels, and also EmbeddedXlsx2.bin or whatever that part is named. I don't thing the Xlsx part has any relationships (.rels file), but if it did you'd want to get rid of that too.

In any case, I imagine it's safe to say referring to the same chart from two slides is either not allowed by PowerPoint, or is a bad idea even if it is allowed. I suppose I can imagine an edge case of some sort where editing a chart on one slide causes a copy on another slide to be updated (maybe), but it certainly wouldn't be the mainstream use case. I'd be strongly inclined to pursue the "clone a chart and its Xlsx when copying a slide having a chart" approach.

Btw, the research you're doing here is a valuable contribution, even if you don't end up submitting a pull request for the completed feature. This is exactly the course of experimentation anyone would need to pursue in preparation for a successful implementation of this feature :)

biggihs commented 6 years ago

Again, thank you for your help. I tried to add the "p:sldLst" tag but that corrupted my file. (It's probably because of a error on my part.)

I do not want to have two charts pointing to the same file, so it's alright that it's not possible.

I have had some success now, I was able to manually duplicate a slide in a pptx file. I decided to create a new Powerpoint file with only a single sheet with a single chart (single.pptx) then duplicating the sheet and saving that as a new file (duplicate.pptx) and look at the difference between the two files.

[using PowerPoint] Duplicating single.pptx to duplicate.pptx creates these new files:

First, I checked the difference between the new files that had been created in the duplicate.pptx

  1. Inppt/slides/slide1.xml the val attribute is val="1287465769" but in ppt/slides/slide2.xml it's val="531334335".
    (in )

  2. The difference between ppt/slides/_rels/slide1.xml.rels vs ppt/slides/_rels/slide2.xml.rels.
    In slide1.xml.rels the target attribute is Target="../charts/chart1.xml" but in slide2.xml.rels its Target="../charts/chart2.xml"

  3. /ppt/charts/style1.xml is identical to /ppt/charts/style2.xml

  4. /ppt/charts/colors1.xml is identical to /ppt/charts/colors2.xml

  5. The only difference between /ppt/charts/chart1.xml and /ppt/charts/chart2.xml are several axis value attributes: <c:barChart><c:axId val="-2124546544"/> vs <c:barChart><c:axId val="-2091599616"/> <c:barChart><c:axId val="-2096492960"/> vs <c:barChart><c:axId val="-2066918304"/> <c:catAx><c:crossAx val="-2096492960"/> vs <c:catAx><c:crossAx val="-2066918304"/> <c:catAx><c:axId val="-2124546544"/> vs <c:catAx><c:axId val="-2091599616"/> <c:valAx><c:axId val="-2096492960"/> vs <c:valAx><c:axId val="-2066918304/> <c:valAx><c:axId val="-2124546544"/> vs <c:valAx><c:axId val="-2091599616/>

  6. The difference between /ppt/charts/_rels/chart1.xml.rels and /ppt/charts/_rels/chart2.xml.rels is that the Target attribute. Target="style1.xml" vs Target="style2.xml" Target="colors1.xml" vs Target="colors2.xml" Target="../embeddings/Microsoft_Excel_Worksheet1.xlsx" vs Target="../embeddings/Microsoft_Excel_Worksheet2.xlsx"

  7. There is no (binary) difference between `/ppt/embeddings/Microsoft_Excel_Worksheet1.xlsx vs /ppt/embeddings/Microsoft_Excel_Worksheet2.xlsx.

Secondly, I checked the difference between files in the single.pptx vs duplicate.pptx

Note: If the files are not listed here, then there was no difference between them.
  1. /[Content_Types].xml

These rows had been added in duplicate/[Content_Types].xml

<Override PartName="/ppt/slides/slide2.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slide+xml"/>
<Override PartName="/ppt/charts/chart2.xml" ContentType="application/vnd.openxmlformats-officedocument.drawingml.chart+xml"/>
<Override PartName="/ppt/charts/style2.xml" ContentType="application/vnd.ms-office.chartstyle+xml"/>
<Override PartName="/ppt/charts/colors2.xml" ContentType="application/vnd.ms-office.chartcolorstyle+xml"/>
  1. /docProps/app.xml The Paragraph tag content was changed from <Paragraphs>1</Paragraphs> to <Paragraphs>2</Paragraphs> The Slides tag content was changed from <Slides>1</Slides> to <Slides>2</Slides> The vt:i4 tag content (in <HeadingPairs><vt:variant>) was changed from <vt:i4>1</vt:i4> to <vt:i4>2</vt:i4> The size attribute in vt:vector (in) was changed from <TitlesOfParts><vt:vector size="5" baseType="lpstr"> to <TitlesOfParts><vt:vector size="6" baseType="lpstr"> And one instance of <vt:lpstr>PowerPoint Presentation</vt:lpstr> was added into <TitlesOfParts><vt:vector ...>

  2. /docProps/core.xml Timestamp updated in <dcterms:modified> and <c:revision> content incremented by 1

  3. /ppt/_rels/presentation.xml.rels A <Relationship id="rId3"... Target="slides/slide2.xml"/> has been added. I noticed that slide1.xml has id="rId2" and the slide2.xml got the next id number and the relationships presProps, viewProps, theme1 and tableStyles got their id's incremented (they were changed from rId3, rId4, rId5, rId6 to rId4, rId5, rId6, rId7, respectively).

  4. /ppt/presentation.xml A new <p:sldId>tag was added to <p:sldIdLst>. It had incremented the id and r:id attribute by one.

    <p:sldIdLst>
    <p:sldId id="256" r:id="rId2"/> 
        <p:sldId id="257" r:id="rId3"/>
    </p:sldIdLst>
  5. /ppt/viewProps.xml Attribute lastView="sldThumbnailView" was added to the <p:viewPr> tag.

Thirdly, Manually duplicating.

I was able to manage to duplicate a sheet using these steps.

  1. Extracted the single.pptx using opc

    $ opc extract single.pptx single
    $ cd single
  2. I copied these files:

    cp /ppt/slides/slide1.xml /ppt/slides/slide2.xml
    cp /ppt/slides/_rels/slide1.xml.rels /ppt/slides/_rels/slide2.xml.rels
    cp /ppt/embeddings/Microsoft_Excel_Worksheet1.xlsx /ppt/embeddings/Microsoft_Excel_Worksheet2.xlsx
    cp /ppt/charts/style1.xml /ppt/charts/style2.xml
    cp /ppt/charts/colors1.xml /ppt/charts/colors2.xml
    cp /ppt/charts/chart1.xml /ppt/charts/chart2.xml
    cp /ppt/charts/_rels/chart1.xml.rels /ppt/charts/_rels/chart2.xml.rels
  3. Edit the ppt/slides/_rels/slide2.xml.rels Changed Target="../charts/chart1.xml" to Target="../charts/chart2.xml"

  4. Edit the ppt/charts/_rels/chart2.xml.rels`` ChangedTarget="style1.xml"toTarget="style2.xml" ChangedTarget="colors1.xml"toTarget="colors2.xml" ChangedTarget="../embeddings/Microsoft_Excel_Worksheet1.xlsx"toTarget="../embeddings/Microsoft_Excel_Worksheet2.xlsx"`

  5. Added these lines in [Content_Types].xml

    <Override PartName="/ppt/slides/slide2.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slide+xml"/>
    <Override PartName="/ppt/charts/chart2.xml" ContentType="application/vnd.openxmlformats-officedocument.drawingml.chart+xml"/>
    <Override PartName="/ppt/charts/style2.xml" ContentType="application/vnd.ms-office.chartstyle+xml"/>
    <Override PartName="/ppt/charts/colors2.xml" ContentType="application/vnd.ms-office.chartcolorstyle+xml"/>
  6. Added the slide2 relationship to rels/presentation.xml._rel Note: I only gave the new relationship the next iteration id ("rId7")

    <Relationship Id="rId7" Target="slides/slide2.xml" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide" />
  7. Added <p:sldId id="257" r:id="rId7"/> to <p:sldIdLst> in ppt/presentation.xml

  8. Repackaged the single/ using opc

    $ opc repackage single `duplicate.pptx`

Voila.

scanny commented 6 years ago

@biggihs

I tried to add the "p:sldLst" tag but that corrupted my file. (It's probably because of a error on my part.)

The element (turns out it is p:sldIdLst http://python-pptx.readthedocs.io/en/latest/dev/analysis/prs-properties.html#xml-specimens) will already be present in the presentation.xml part. You just need to make sure there's a p:sldId element in it for your slide.

[using PowerPoint] Duplicating single.pptx to duplicate.pptx creates these new files:

  • duplicate/ppt/slides/slide2.xml
  • duplicate/ppt/slides/_rels/slide2.xml.rels
  • duplicate/ppt/embeddings/Microsoft_Excel_Worksheet2.xlsx
  • duplicate/ppt/charts/style2.xml
  • duplicate/ppt/charts/colors2.xml
  • duplicate/ppt/charts/chart2.xml
  • duplicate/ppt/charts/_rels/chart2.xml.rels

Hmm, this is interesting. The existing code doesn't create a styleN.xml and colorsN.xml parts when adding a chart, so those must be optional. For duplication purposes, the safest bet would be to copy them as well, but that might be a finer point worth looking into for anyone developing a production version.

First, I checked the difference between the new files that had been created in the duplicate.pptx

  1. Inppt/slides/slide1.xml the val attribute is val="1287465769" but in ppt/slides/slide2.xml it's val="531334335".
    (in )

Hmm, I don't know what this value represents, and I expect it's poorly documented. It might make sense to look it up by the GUID of the extension elsewhere in that element. Sometimes that sheds some light. I would have guessed it was a timestamp, but the second value is less that the original value, so that wouldn't make a lot of sense. The other thing I can think of is a "Creator" ID, possibly different versions of PowerPoint or something.

If it mattered, I'd be inclined just to copy it over unchanged or leave out that p:ext element, whichever is easier.

  1. The only difference between /ppt/charts/chart1.xml and /ppt/charts/chart2.xml are several axis value attributes: <c:barChart><c:axId val="-2124546544"/> vs <c:barChart><c:axId val="-2091599616"/> <c:barChart><c:axId val="-2096492960"/> vs <c:barChart><c:axId val="-2066918304"/> <c:catAx><c:crossAx val="-2096492960"/> vs <c:catAx><c:crossAx val="-2066918304"/> <c:catAx><c:axId val="-2124546544"/> vs <c:catAx><c:axId val="-2091599616"/> <c:valAx><c:axId val="-2096492960"/> vs <c:valAx><c:axId val="-2066918304/> <c:valAx><c:axId val="-2124546544"/> vs <c:valAx><c:axId val="-2091599616/>

I don't know why PowerPoint makes Axis IDs unique across all charts in a presentation. Individual axes in charts created by python-pptx are unique within the chart, but the same for each chart of that type (they're part of the boilerplate XML used to create a specific chart type).

I expect you can copy them without change and not experience any problems, since that's what you'd get if you created both charts using python-pptx.

  1. /docProps/app.xml The Paragraph tag content was changed from <Paragraphs>1</Paragraphs> to <Paragraphs>2</Paragraphs> The Slides tag content was changed from <Slides>1</Slides> to <Slides>2</Slides> The vt:i4 tag content (in <HeadingPairs><vt:variant>) was changed from <vt:i4>1</vt:i4> to <vt:i4>2</vt:i4> The size attribute in vt:vector (in) was changed from <TitlesOfParts><vt:vector size="5" baseType="lpstr"> to <TitlesOfParts><vt:vector size="6" baseType="lpstr"> And one instance of <vt:lpstr>PowerPoint Presentation</vt:lpstr> was added into <TitlesOfParts><vt:vector ...>

python-pptx doesn't modify the app.xml part, which hasn't caused a problem yet. So I expect all these changes could be skipped. The only thing I think it would affect is properties seen in a file browser, like the Finder on a Mac and Windows Explorer on a PC.

  1. /docProps/core.xml Timestamp updated in <dcterms:modified> and <c:revision> content incremented by 1

python-pptx doesn't update this item either.

  1. /ppt/_rels/presentation.xml.rels A <Relationship id="rId3"... Target="slides/slide2.xml"/> has been added. I noticed that slide1.xml has id="rId2" and the slide2.xml got the next id number and the relationships presProps, viewProps, theme1 and tableStyles got their id's incremented (they were changed from rId3, rId4, rId5, rId6 to rId4, rId5, rId6, rId7, respectively).

The rIdN values chosen have no significance, as long as they are unique and consistent. They are arbitrary keys. In a python-pptx context, you'd just choose the next available key for a new slide and be done with it. There's a method for that somewhere in the packaging API.

  1. /ppt/viewProps.xml Attribute lastView="sldThumbnailView" was added to the <p:viewPr> tag.

python-pptx doesn't update viewProps.xml either. This one is safely skipped.

... and Voila.

Congratulations! Looks like you've discovered the recipe :) I wish I could tell you it usually isn't this hard, but in fact is often is :) There are a lot of details that aren't documented and have to be worked out by extended searching and experimentation.

biggihs commented 6 years ago

I needed this feature asap, so I created a method to duplicate a slide in a powerpoint file. It just does it brute-force by extracting the file, doing the copies/edits files and then repackaging it.

I would like to implement this feature correctly into python-pptx using the information in this issue and the correct methodologies. I however don't fully understand how all the pieces fit together in the python-pptx library so I would need help/guidance to finish the job sufficiently. Would you be willing to point me in the right direction and tell me what requirements you have towards the feature?

Here is my code, until the feature gets implemented in python-pptx. https://gist.github.com/biggihs/b8e1374a9f282b117d171f020fe6be45

scanny commented 6 years ago

@biggihs I'm happy to help you as time allows. I've just started a new gig so will have somewhat less time for that sort of thing than I do "between" gigs :)

The first step is getting the research/analysis documented in an analysis page (docs/dev/analysis/...). That's a good place to start and is separately committable. It also is a valuable contribution even if you don't go on the implement the feature.

I expect a good part of that would be just organizing the information that has come out on this thread and adding in some extracts from the XML schema files and so on. Let me know if you need help getting started with that.

biggihs commented 6 years ago

Thank you. I won't be able to add the documenation/analysis until after new years but that is where I will start. That will also probably help me understand how to put it together. Thanks again.

Fideldue commented 6 years ago

First of all: great work! @biggihs I tried to work with your 'brute-force' solution, but it creates a corrupted file for me (The only thing I changed was unicode() -> str() because I'm using Python36). So my main question is: did your final code work flawlessly for you?

Stefan2142 commented 6 years ago

Yes, using the robintw's solution to duplicate slides can sometimes force PowerPoint to show 'Repair this file' message on startup..but good thing is - not always

biggihs commented 6 years ago

@Fideldue sorry for my late reply. I just saw this message. I'm using it programmatically and so far it does seem to work without fault.

Did you try with a new "clean" powerpoint file?

The most common reason I got corrupted files when debugging this feature whas that the "ref" ids didn't match the files that were generated.

You can try to use the brute-force method and then "unpack" the results with opc to see if there are any duplicate "ref" ids or if there are some elements that are missing.

alexdriedger commented 6 years ago

I am also looking for this feature. My use case is merge a bunch of powerpoints into one.

nshgraph commented 6 years ago

FWIW, this is how I added chart support:

from pptx.parts.chart import ChartPart
from pptx.parts.embeddedpackage import EmbeddedXlsxPart

def _get_blank_slide_layout(pres):
    layout_items_count = [len(layout.placeholders)
                          for layout in pres.slide_layouts]
    min_items = min(layout_items_count)
    blank_layout_id = layout_items_count.index(min_items)
    return pres.slide_layouts[blank_layout_id]

def duplicate_slide(pres, index):
    """Duplicate the slide with the given index in pres.

    Adds slide to the end of the presentation"""
    source = pres.slides[index]
    blank_slide_layout = _get_blank_slide_layout(pres)
    dest = pres.slides.add_slide(blank_slide_layout)

    for shape in source.shapes:
        newel = copy.deepcopy(shape.element)
        dest.shapes._spTree.insert_element_before(newel, 'p:extLst')

    for key, value in source.part.rels.items():
        # Make sure we don't copy a notesSlide relation as that won't exist
        if "notesSlide" not in value.reltype:
            target = value._target
            # if the relationship was a chart, we need to duplicate the embedded chart part and xlsx
            if "chart" in value.reltype:
                partname = target.package.next_partname(
                    ChartPart.partname_template)
                xlsx_blob = target.chart_workbook.xlsx_part.blob
                target = ChartPart(partname, target.content_type,
                                   copy.deepcopy(target._element), package=target.package)

                target.chart_workbook.xlsx_part = EmbeddedXlsxPart.new(
                    xlsx_blob, target.package)

            dest.part.rels.add_relationship(value.reltype,
                                            target,
                                            value.rId)

    return dest
jsolack commented 6 years ago

I am trying to the duplicate slide code posted by nshgraph.

ChartPart is not defined here: ChartPart.partname_template) am i missing a definition?

salehciq commented 5 years ago

@jsolack did you import it? from pptx.parts.chart import ChartPart