scanny / python-pptx

Create Open XML PowerPoint documents in Python
MIT License
2.38k stars 514 forks source link

Iterating through `slide.part.rels.items()` raises a KeyError from 0.6.20 onwards #754

Closed sanand0 closed 1 year ago

sanand0 commented 2 years ago

I use this code to copy a slide:

import copy
from pptx import Presentation

def copy_slide(prs, source, target_index):
    '''
    Copy ``source`` slide from presentatation ``prs`` to appear at ``target_index``.

    :arg Presentation prs: presentation to copy the slide in
    :arg Slide source: slide to copy
    :arg target_index: location to copy into. 0 makes it the first slide
    '''
    # Append slide with source's layout. Then delete shapes to get a blank slide
    dest = prs.slides.add_slide(source.slide_layout)
    for shp in dest.shapes:
        shp.element.getparent().remove(shp.element)
    # Copy shapes from source, in order
    for shape in source.shapes:
        new_shape = copy.deepcopy(shape.element)
        dest.shapes._spTree.insert_element_before(new_shape, 'p:extLst')
    # Copy rels from source
    for key, val in source.part.rels.items():
        target = val._target
        dest.part.rels.add_relationship(val.reltype, target, val.rId, val.is_external)
    # Move appended slide into target_index
    prs.slides.element.insert(target_index, prs.slides.element[-1])
    return dest

prs = Presentation('example.pptx')
source = prs.slides[0]
copy_slide(prs, source, 1)

This works fine in python-pptx 0.6.19 with example.pptx.

But in 0.6.20, it raises a KeyError

Traceback (most recent call last):
  File "D:\anaconda\3.7\lib\site-packages\pptx\opc\package.py", line 505, in __getitem__
    return self._rels[rId]
KeyError: <pptx.opc.package._Relationship object at 0x00000242D5189A20>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "killme.py", line 34, in <module>
    copy_slide(prs, source, 1)
  File "killme.py", line 24, in copy_slide
    for key, val in source.part.rels.items():
  File "D:\anaconda\3.7\lib\_collections_abc.py", line 744, in __iter__
    yield (key, self._mapping[key])
  File "D:\anaconda\3.7\lib\site-packages\pptx\opc\package.py", line 507, in __getitem__
    raise KeyError("no relationship with key '%s'" % rId)
KeyError: "no relationship with key '<pptx.opc.package._Relationship object at 0x00000242D5189A20>'"

Could you please guide me on what's changed, @scanny? Thanks!

scanny commented 2 years ago

Ah, interesting. So it looks like my (re)implementation of pptx.opc.package._Relationships as a subclass of the Mapping ABC is not quite correct. It appears that it expects _Relationships.__iter__() to generate the keys (rIds) of the relationship collection and I have it generating the relationship instances. That makes sense now that I see that error but escaped my attention during development.

So that's probably something for me to fix, but in the meantime, this replacement should do the trick:

    # --- change:
    for key, val in source.part.rels.items():
        target = val._target
        dest.part.rels.add_relationship(val.reltype, target, val.rId, val.is_external)

    # --- to:
    for rel in source.part.rels:
        dest.part.rels.add_relationship(rel.reltype, rel._target, rel.rId, rel.is_external)
sanand0 commented 2 years ago

Got it. Thanks @scanny.

Would the replacement you suggested work in earlier versions and future versions too? If so, I'll make this change permanent.

scanny commented 2 years ago

Not so sure about that. In the original version, _Relationships inherited from dict I believe, and I'm not sure one could produce code that worked with both versions without branching on pptx.__version__.

The _Relationships object is not really an interface object so we don't feel compelled to maintain backward compatibility on it. That said, once the .__iter__() change is made it should work the same (like .items() works and other dict/mapping methods).

You could do something like:

from pptx import __version__

rels = (
    source.part.rels
    if __version__.split(".") < [0, 6, 21]
    else source.part.rels._rels
)

And then base your code on rels. _Relationships._rels is a dict holding the actual rId: _Relationship mappings in the latest version.

sanand0 commented 2 years ago

I'll plan accordingly. Thanks @scanny

MartinPacker commented 2 years ago

As a matter of interest, are you copying within a presentation? Or between presentations?

sanand0 commented 2 years ago

Within a presentation. This page gives you an idea of what we're doing -- and I think you'll enjoy what your work's helped us create 🙂

scanny commented 2 years ago

Very cool @sanand0 ! :)

phyng commented 2 years ago

Thanks @sanand0 for open this issue, I install old version 0.6.18 to fix this error.

BTW, copy_slide is the key feature of our product, we use python-pptx to generate pptx from the template, In our scenario, the template is controllable, so although copy_slide has some compatibility issues, it can still achieve most of the functions by modifying the template. Thanks @scanny for your amazing works!

alainpannetier commented 2 years ago

Is this fixed in the source, or any new version? I tried all versions from 0.6.18 to current 0.6.21 and still running into this.

eltropico commented 2 years ago

Hi @scanny,

the following code works for versions 0.6.19 and earlier:

for _, value in six.iteritems(template.part.rels):
 if "notesSlide" not in value.reltype:
   copied_slide.part.rels.add_relationship(value.reltype, value._target, value.rId)

Per your comment, I changed it to:

for rel in template.part.rels:
  copied_slide.part.rels.add_relationship(rel.reltype, rel._target, rel.rId, rel.is_external)

but I get the following error:

AttributeError: '_Relationships' object has no attribute 'add_relationship'

Please help!

lthamm commented 2 years ago

I've encoutered the same issue. It seems like the signature has changed: https://github.com/scanny/python-pptx/blob/62f7f6211c29150245fd8b383c5bde69534f48dd/pptx/opc/package.py#L484

You can still access copied_slide.part.rels._add_relationship(...) but it does not allow you to pass the relationship id. The id now comes from adding a relationship with get_or_add_ext_rel or get_or_add_rel which uses get_or_add_ext_rel to generate the id or uses _get_matching to find an existing id.

I tried using it like this:

    for rel in source.part.rels:
        if rel.is_external:
            dest.part.rels.get_or_add_ext_rel(rel.reltype, rel._target)
        else:
            dest.part.rels.get_or_add(rel.reltype, rel._target)

But it leaves me with a corrupted output file. So far I could not find a solution other than to downgrade to 0.6.19 and it seems like even @sanand0 is still using the older version (correct me if I am wrong, but I figured it from your source code).

Update: The file corruption comes from copying charts. This happens with 0.6.19 too - the solution suggest above might very well work without charts. Will test that now.

Update 2: The solution above works (but not for charts, as doesnt the old solution).

Update 3: I've posted my current progress with a solution for 0.6.21 here: https://github.com/scanny/python-pptx/issues/132#issuecomment-1098398796 Until then I'll be using 0.6.19

scanny commented 1 year ago

Fixed in version 0.6.22 circa Aug 20, 2023.

_Relationships.items() now works as expected (generating rId, relationship pairs).

Note that iterating _Relationships now generates the rIds. Basically _Relationships.__iter__() is now equivalent to _Relationships.keys() rather than _Relationships.values() as it was.

scanny commented 1 year ago

Note that _Relationships is still not an interface class so any use of it is off-label and its behavior is not guaranteed. Wouldn't stop me from using it, but be aware of the risks.