scanny / python-pptx

Create Open XML PowerPoint documents in Python
MIT License
2.4k stars 520 forks source link

feature: delete picture #41

Open jhludwig opened 11 years ago

jhludwig commented 11 years ago

Request for method to delete an existing picture in a presentation. An example use is to allow an existing slide deck to be repurposed by simply changing the customer logo in the deck.

scanny commented 11 years ago

@jhludwig: This should get it done as a temporary workaround. I'll leave this case open as a feature request to get the general case shapes.remove(shape) capability built.

Let me know how you go :)

from pptx import Presentation
from pptx.oxml import qn

def image_part_rId(picture):
    """
    Return relationship id, e.g. 'rId9' corresponding to the image part
    referenced by the picture shape *picture*.
    """
    blip_elm = picture._element.blipFill[qn('a:blip')]
    return blip_elm.get(qn('r:embed'))

def remove_picture(slide, picture):
    """
    Remove *picture* shape from *slide*.
    """
    # remove relationship to image part
    rId = image_part_rId(picture)
    remove_rel_with_rId(slide, rId)
    # remove <pic> element
    pic_elm = picture._element
    spTree = pic_elm.getparent()
    spTree.remove(pic_elm)
    # remove picture from shapes
    slide.shapes._values.remove(picture)

def remove_rel_with_rId(slide, rId):
    """
    Remove the relationship identified by *rId* from *slide*.
    """
    # get relationships list
    rels = slide._relationships._values
    # delete relationship with matching rId
    for rel in rels:
        if rel._rId == rId:
            rels.remove(rel)
            break

prs = Presentation('contains_picture.pptx')
slide = prs.slides[0]
picture = slide.shapes[0]

remove_picture(slide, picture)

prs.save('out.pptx')
jhludwig commented 11 years ago

This is awesome! Just tried it out and it was nearly perfect. One speed bump I hit -- for whatever reason, in my slide deck, the slide._relationships value was null, and so in the function remove_rel_with_rId that you specified, execution would blow up on the rels - slide._relationships._values line (pptxparse.py is my program):

File "pptxparse.py", line 119, in <module>
remove_picture(shapes, logopicture)
File "pptxparse.py", line 31, in remove_picture
remove_rel_with_rId(slide, rId)
File "pptxparse.py", line 44, in remove_rel_with_rId
rels = slide._relationships._values
AttributeError: '_ShapeCollection' object has no attribute '_relationships'

so i wrapped it with a "try" and all works ok for now:

   try:
      rels = slide._relationships._values
    except AttributeError:
      return

I coupled this with your other code to query the existing picture and strung it all together: get existing measurements, delete old picture, add a new picture with the old measurements. it works amazingly well!

thanks much. if there is anything i can do to help (on this issue or any other) let me know...

scanny commented 11 years ago

Oops, should be remove_picture(*slide*, picture), not remove_picture(shapes,picture) :)

I've updated above.

scanny commented 11 years ago

Note to self: implement shape delete as a method on each _BaseShape subclass, allowing distinct methods for each shape type and incremental addition of capability, with NotImplementedError on _BaseShape.delete(). E.g. _Picture.delete() executes code roughly equivalent to above. So once implemented, deleting a picture shape would be coded:

prs = Presentation('contains_picture.pptx')
picture = prs.slides[0].shapes[0]  # or suitable to specific case
picture.delete()
prs.save('out.pptx')
osamak commented 8 years ago

The workaround mentioned above is no longer functional. A new workaround would be appreciated!

nikayou commented 7 years ago

Still no news? Deleting shapes is a basic feature but slide.shapes._values.remove(picture) cannot be performed (SlideShapes object has no attribute _values)

rhofst commented 7 years ago

I am also looking for a solution to this problem.

nikayou commented 7 years ago

As far as I know, deleting a picture consists in this list of steps:

HOWEVER if two shapes A and B have a reference to the same image, deleting it in A will result in invalid image for B. So either do not delete the image file (that what Apache POI does), or remove it only if unused anywhere else (like, keeping a dict with image reference mapped to a counter??).

scanny commented 7 years ago

@Lasconik Yes, you see the complexity involved.

The details of relationships are abstracted out into the pptx.opc sub-package; so things like looking up an image part by the r:Id used in the pic element is accomplished with calls like:

image_part = shape.part.related_parts[rId]

Also in the opc subpackage is the OpcPackage class, which represents all the parts and all their relationships. Basically it's the .pptx file once opened.

A reference to the package can be gotten from a part:

package = shape.part.package

And all the relationships in the package can be accessed from there:

for rel in package.iter_rels():
    if rel.target is the image file:
        usage count += 1

So I'd be inclined to have it just traverse all the rels when needed rather than try to maintain a synchronized dictionary or something that you have to remember to update.

rhofst commented 7 years ago

Has their been any progress on this? I got distracted because of x-mas and haven't accomplished anything myself. I just want to make sure I don't recreate the wheel when I start working on this again.

scanny commented 7 years ago

I haven't seen anything come through on it. I expect you're clear to go :)

vivek1383 commented 4 years ago

Hello! Has this feature been updated?

NataliaAssange commented 2 years ago

Is that feature shapes.remove() available now?