scanny / python-pptx

Create Open XML PowerPoint documents in Python
MIT License
2.26k stars 499 forks source link

Support PPTX files protected by Azure Rights Management Service #930

Open tkarabela opened 7 months ago

tkarabela commented 7 months ago

Currently, PPTX files that are protected by Azure Rights Management (RMS) cannot be opened by the library (as of version 0.6.23). Attempting to open such a file throws pptx.exc.PackageNotFoundError: Package not found at 'my_file.pptx' in _PhysPkgReader.factory(), because such PPTX file is not a ZIP file.

The file starts with the byte sequence: d0cf11e0a1b11ae1..., indicating it probably is a CFBF file. I was able to open it using version 0.3 of the compoundfiles library, with the following result:

import compoundfiles
doc = compoundfiles.CompoundFileReader("my_file.pptx")
print(doc.root)
# ["<CompoundFileEntity dir='\x06DataSpaces'>",
#  "<CompoundFileEntity name='EncryptedSIHash'>",
#  "<CompoundFileEntity name='EncryptedDSIHash'>",
#  "<CompoundFileEntity name='EncryptedPackage'>",
#  "<CompoundFileEntity name='\x05SummaryInformation'>",
#  "<CompoundFileEntity name='\x05DocumentSummaryInformation'>"]

As far as I can tell, in an IT environment that uses RMS, PowerPoint from Office 365 saves either standard PPTX files or these encrypted PPTX files based on selected confidentiality level for the document. From Office's point of view, it's handled transparently, I've only run into the difference when trying to edit the file using this library or open it in OpenOffice/LibreOffice (which also doesn't work).

Microsoft apparently provides Information Protection SDK that may be able to help.