scanny / python-pptx

Create Open XML PowerPoint documents in Python
MIT License
2.26k stars 499 forks source link

Error when extract 'jpeg' image from pptx, "AttributeError: 'Part' object has no attribute 'image'" #929

Open wzp123123 opened 7 months ago

wzp123123 commented 7 months ago

I use the below code to extract images from pptx:

code

if shape.shape_type == MSO_SHAPE_TYPE.PICTURE: image_bytes = shape.image.blob

for some images, raise: File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/pptx/shapes/picture.py:195, in Picture.image(self) 193 if rId is None: 194 raise ValueError("no embedded image") --> 195 return slide_part.get_image(rId)

File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/pptx/parts/slide.py:30, in BaseSlidePart.get_image(self, rId) 24 def get_image(self, rId): 25 """ 26 Return an |Image| object containing the image related to this slide 27 by rId. Raises |KeyError| if no image is related by that id, which 28 would generally indicate a corrupted .pptx file. 29 """ ---> 30 return self.related_part(rId).image

AttributeError: 'Part' object has no attribute 'image'


I find all 'png' images extract successfully, all 'jpeg' images failed.

wzp123123 commented 7 months ago

As a workaround, i can extract images via zipfile from "ppt/media". ref: https://github.com/madyel/extract_media_ppt

MartinPacker commented 7 months ago

That, in fact, might be a faster way of doing it. But you lose the context of which slide they came from - which you might not care about.

wzp123123 commented 7 months ago

That, in fact, might be a faster way of doing it. But you lose the context of which slide they came from - which you might not care about.

I meet the above error when i extract 'jpeg' image😂 directly.