Open shoang22 opened 4 months ago
There's probably rather more to deleting a movie than removing a chunk of XML.
@shoang22 you're going to want to remove the relationship from the slide (package) part to the part containing the movie (Media part maybe?). Otherwise I expect PowerPoint isn't going to like seeing the orphaned movie. Not sure if that's the whole problem, unfortunately the repair error doesn't give us any idea of what it figures to be a "problem with content".
And what would be the strategy for doing this - in Python code? I ask, @scanny, because this logic is probably common to other removals.
Basically dig out the relationship and delete it.
The relationship(s) would be identified by an embed
or link
element with rId="rId{N}"
I believe, dumping the XML for the moving shape would give you and idea.
Then you need to get to the slide part because that's the source side of the relationship, so something like:
slide_part = slide.part
slide_part.rels.drop_rel("rIdN")
Somebody can dig through and refine that with actual code if they have a mind to :)
Somebody can dig through and refine that with actual code if they have a mind to :)
Something like this?
def remove_movie(file_path: str) -> None:
slides_folder = os.path.dirname(file_path) + "/slides"
os.makedirs(slides_folder, exist_ok=True)
prs = pptx.Presentation(file_path)
for idx, slide in enumerate(prs.slides):
for shape in slide.shapes:
if type(shape) == pptx.shapes.picture.Movie:
p = slide.part
x = etree.fromstring(p.rels.xml)
before = etree.tostring(x, pretty_print=True)
print(before.decode())
vid = shape.element
vid.getparent().remove(vid)
p.rels.pop("rId2")
y = etree.fromstring(p.rels.xml)
after = etree.tostring(y, pretty_print=True)
print(after.decode())
prs.save(file_path.rpartition(".")[0] + "_no_movies.pptx")
Prints:
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Id="rId1" Type="http://schemas.microsoft.com/office/2007/relationships/media" Target="../media/media1.mp4"/>
<Relationship Id="rId2" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/video" Target="../media/media1.mp4"/>
<Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slideLayout" Target="../slideLayouts/slideLayout1.xml"/>
<Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/notesSlide" Target="../notesSlides/notesSlide1.xml"/>
<Relationship Id="rId5" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="../media/image1.png"/>
<Relationship Id="rId6" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="../media/image2.jpeg"/>
</Relationships>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Id="rId1" Type="http://schemas.microsoft.com/office/2007/relationships/media" Target="../media/media1.mp4"/>
<Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slideLayout" Target="../slideLayouts/slideLayout1.xml"/>
<Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/notesSlide" Target="../notesSlides/notesSlide1.xml"/>
<Relationship Id="rId5" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="../media/image1.png"/>
<Relationship Id="rId6" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="../media/image2.jpeg"/>
</Relationships>
But I'm still getting the same error when I attempt to open the file.
Okay, so a couple possible approaches:
$ unzip original.pptx
). Then make the changes by hand, re-zip the presentation into a PPTX file and keep trying things until it works.The opc-diag
tool was built for this kind of exploration:
pip install -U git+https://github.com/python-openxml/opc-diag.git@develop
diff
, extract
, and repackage
subcommands are most useful for this work. In particular, just unzipping a PPTX leaves all the content in any of the XML files on a single line, which of course is hard to edit. opc-diag
automatically reformats that nicely for you.You might want to do a mix of these two approaches. The diff
approach is good when you have no clue of what changes are required. The edit->repackage->try cycle is best when you have a pretty good idea what changes to try.
Hello,
I'm trying to remove all movies on each slide with the following:
The code executes successfully, but when I try to open the output file, I get the following error:
Is there something that I'm doing wrong?