scanny / python-pptx

Create Open XML PowerPoint documents in Python
MIT License
2.35k stars 510 forks source link

Some Presentations does not open at all #818

Open Enne31 opened 2 years ago

Enne31 commented 2 years ago

Failure to open some Presentations, not all

from pptx import Presentation
prs = Presentation("C:\\Users\\ISAT\\Desktop\\Teste PPT\\43733.pptx")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\ISAT\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pptx\api.py", line 28, in Presentation
    presentation_part = Package.open(pptx).main_document_part
  File "C:\Users\ISAT\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pptx\opc\package.py", line 73, in open
    return cls(pkg_file)._load()
  File "C:\Users\ISAT\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pptx\opc\package.py", line 157, in _load
    pkg_xml_rels, parts = _PackageLoader.load(self._pkg_file, self)
  File "C:\Users\ISAT\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pptx\opc\package.py", line 186, in load
    return cls(pkg_file, package)._load()
  File "C:\Users\ISAT\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pptx\opc\package.py", line 190, in _load
    parts, xml_rels = self._parts, self._xml_rels
  File "C:\Users\ISAT\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pptx\util.py", line 215, in __get__
    value = self._fget(obj)
  File "C:\Users\ISAT\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pptx\opc\package.py", line 219, in _parts
    content_types = self._content_types
  File "C:\Users\ISAT\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pptx\util.py", line 215, in __get__
    value = self._fget(obj)
  File "C:\Users\ISAT\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pptx\opc\package.py", line 203, in _content_types
    return _ContentTypeMap.from_xml(self._package_reader[CONTENT_TYPES_URI])
  File "C:\Users\ISAT\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pptx\opc\serialized.py", line 35, in __getitem__
    return self._blob_reader[pack_uri]
  File "C:\Users\ISAT\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pptx\opc\serialized.py", line 176, in __getitem__
    if pack_uri not in self._blobs:
  File "C:\Users\ISAT\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pptx\util.py", line 215, in __get__
    value = self._fget(obj)
  File "C:\Users\ISAT\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pptx\opc\serialized.py", line 184, in _blobs
    return {PackURI("/%s" % name): z.read(name) for name in z.namelist()}
  File "C:\Users\ISAT\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pptx\opc\serialized.py", line 184, in <dictcomp>
    return {PackURI("/%s" % name): z.read(name) for name in z.namelist()}
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\zipfile.py", line 1473, in read
    return fp.read()
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\zipfile.py", line 910, in read
    buf += self._read1(self.MAX_N)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\zipfile.py", line 1014, in _read1
    self._update_crc(data)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\zipfile.py", line 942, in _update_crc
    raise BadZipFile("Bad CRC-32 for file %r" % self.name)
zipfile.BadZipFile: Bad CRC-32 for file 'ppt/media/media13.m4a' 

String for location is not the problem, already tested it. This only occours on some pptx, can't figure why

Enne31 commented 2 years ago

For some reason it appears to only happen when there's an audio file m4a into the presentation

mszbot commented 2 years ago

Upload an example pptx....

Rebecasarai commented 2 years ago

Hi, is there any solution for this? It's happening to me too. When there is any .mov file inserted in the presentation, it returns an error: BadZipFile: Bad CRC-32 for file 'ppt/media/media2.mov'

Rebecasarai commented 2 years ago

I'm literally just trying to open the presentation with the constructor:

Presentation(path)

And it returns:


---------------------------------------------------------------------------
BadZipFile                                Traceback (most recent call last)
Input In [65], in <cell line: 1>()
----> 1 Presentation(path)

File ~\Anaconda3\envs\base_\lib\site-packages\pptx\api.py:28, in Presentation(pptx)
     25 if pptx is None:
     26     pptx = _default_pptx_path()
---> 28 presentation_part = Package.open(pptx).main_document_part
     30 if not _is_pptx_package(presentation_part):
     31     tmpl = "file '%s' is not a PowerPoint file, content type is '%s'"

File ~\Anaconda3\envs\base_\lib\site-packages\pptx\opc\package.py:73, in OpcPackage.open(cls, pkg_file)
     70 @classmethod
     71 def open(cls, pkg_file):
     72     """Return an |OpcPackage| instance loaded with the contents of `pkg_file`."""
---> 73     return cls(pkg_file)._load()

File ~\Anaconda3\envs\base_\lib\site-packages\pptx\opc\package.py:157, in OpcPackage._load(self)
    155 def _load(self):
    156     """Return the package after loading all parts and relationships."""
--> 157     pkg_xml_rels, parts = _PackageLoader.load(self._pkg_file, self)
    158     self._rels.load_from_xml(PACKAGE_URI, pkg_xml_rels, parts)
    159     return self

File ~\Anaconda3\envs\base_\lib\site-packages\pptx\opc\package.py:186, in _PackageLoader.load(cls, pkg_file, package)
    174 @classmethod
    175 def load(cls, pkg_file, package):
    176     """Return (pkg_xml_rels, parts) pair resulting from loading `pkg_file`.
    177 
    178     The returned `parts` value is a {partname: part} mapping with each part in the
   (...)
    184     object) to load those relationships into its |_Relationships| object.
    185     """
--> 186     return cls(pkg_file, package)._load()

File ~\Anaconda3\envs\base_\lib\site-packages\pptx\opc\package.py:190, in _PackageLoader._load(self)
    188 def _load(self):
    189     """Return (pkg_xml_rels, parts) pair resulting from loading pkg_file."""
--> 190     parts, xml_rels = self._parts, self._xml_rels
    192     for partname, part in parts.items():
    193         part.load_rels_from_xml(xml_rels[partname], parts)

File ~\Anaconda3\envs\base_\lib\site-packages\pptx\util.py:215, in lazyproperty.__get__(self, obj, type)
    210 value = obj.__dict__.get(self.__name__)
    211 if value is None:
    212     # ---on first access, __dict__ item will absent. Evaluate fget()
    213     # ---and store that value in the (otherwise unused) host-object
    214     # ---__dict__ value of same name ('fget' nominally)
--> 215     value = self._fget(obj)
    216     obj.__dict__[self.__name__] = value
    217 return value

File ~\Anaconda3\envs\base_\lib\site-packages\pptx\opc\package.py:219, in _PackageLoader._parts(self)
    210 @lazyproperty
    211 def _parts(self):
    212     """dict {partname: Part} populated with parts loading from package.
    213 
    214     Among other duties, this collection is passed to each relationships collection
   (...)
    217     loaded.
    218     """
--> 219     content_types = self._content_types
    220     package = self._package
    221     package_reader = self._package_reader

File ~\Anaconda3\envs\base_\lib\site-packages\pptx\util.py:215, in lazyproperty.__get__(self, obj, type)
    210 value = obj.__dict__.get(self.__name__)
    211 if value is None:
    212     # ---on first access, __dict__ item will absent. Evaluate fget()
    213     # ---and store that value in the (otherwise unused) host-object
    214     # ---__dict__ value of same name ('fget' nominally)
--> 215     value = self._fget(obj)
    216     obj.__dict__[self.__name__] = value
    217 return value

File ~\Anaconda3\envs\base_\lib\site-packages\pptx\opc\package.py:203, in _PackageLoader._content_types(self)
    197 @lazyproperty
    198 def _content_types(self):
    199     """|_ContentTypeMap| object providing content-types for items of this package.
    200 
    201     Provides a content-type (MIME-type) for any given partname.
    202     """
--> 203     return _ContentTypeMap.from_xml(self._package_reader[CONTENT_TYPES_URI])

File ~\Anaconda3\envs\base_\lib\site-packages\pptx\opc\serialized.py:35, in PackageReader.__getitem__(self, pack_uri)
     33 def __getitem__(self, pack_uri):
     34     """Return bytes for part corresponding to `pack_uri`."""
---> 35     return self._blob_reader[pack_uri]

File ~\Anaconda3\envs\base_\lib\site-packages\pptx\opc\serialized.py:176, in _ZipPkgReader.__getitem__(self, pack_uri)
    171 def __getitem__(self, pack_uri):
    172     """Return bytes for part corresponding to `pack_uri`.
    173 
    174     Raises |KeyError| if no matching member is present in zip archive.
    175     """
--> 176     if pack_uri not in self._blobs:
    177         raise KeyError("no member '%s' in package" % pack_uri)
    178     return self._blobs[pack_uri]

File ~\Anaconda3\envs\base_\lib\site-packages\pptx\util.py:215, in lazyproperty.__get__(self, obj, type)
    210 value = obj.__dict__.get(self.__name__)
    211 if value is None:
    212     # ---on first access, __dict__ item will absent. Evaluate fget()
    213     # ---and store that value in the (otherwise unused) host-object
    214     # ---__dict__ value of same name ('fget' nominally)
--> 215     value = self._fget(obj)
    216     obj.__dict__[self.__name__] = value
    217 return value

File ~\Anaconda3\envs\base_\lib\site-packages\pptx\opc\serialized.py:184, in _ZipPkgReader._blobs(self)
    182 """dict mapping partname to package part binaries."""
    183 with zipfile.ZipFile(self._pkg_file, "r") as z:
--> 184     return {PackURI("/%s" % name): z.read(name) for name in z.namelist()}

File ~\Anaconda3\envs\base_\lib\site-packages\pptx\opc\serialized.py:184, in <dictcomp>(.0)
    182 """dict mapping partname to package part binaries."""
    183 with zipfile.ZipFile(self._pkg_file, "r") as z:
--> 184     return {PackURI("/%s" % name): z.read(name) for name in z.namelist()}

File ~\Anaconda3\envs\base_\lib\zipfile.py:1474, in ZipFile.read(self, name, pwd)
   1472 """Return file bytes for name."""
   1473 with self.open(name, "r", pwd) as fp:
-> 1474     return fp.read()

File ~\Anaconda3\envs\base_\lib\zipfile.py:911, in ZipExtFile.read(self, n)
    909     self._offset = 0
    910     while not self._eof:
--> 911         buf += self._read1(self.MAX_N)
    912     return buf
    914 end = n + self._offset

File ~\Anaconda3\envs\base_\lib\zipfile.py:1015, in ZipExtFile._read1(self, n)
   1013 if self._left <= 0:
   1014     self._eof = True
-> 1015 self._update_crc(data)
   1016 return data

File ~\Anaconda3\envs\base_\lib\zipfile.py:943, in ZipExtFile._update_crc(self, newdata)
    941 # Check the CRC if we're at the end of the file
    942 if self._eof and self._running_crc != self._expected_crc:
--> 943     raise BadZipFile("Bad CRC-32 for file %r" % self.name)

BadZipFile: Bad CRC-32 for file 'ppt/media/media2.mov'```
Enne31 commented 2 years ago

No solution using this library for me, but since what I was trying to do was get the length of a presentation, i could do it using the win32com library, kinda like this

import win32com.client

#prs = Presentation(path_to_pptx.replace("'",""))
#slides_max = len(list(prs.slides))
#console.log(slides_max)

Application = win32com.client.Dispatch("PowerPoint.Application")
Presentation = Application.Presentations.Open(path_to_pptx.replace("'",""))
slides_max = len(Presentation.Slides)
Presentation.Close()

The commented out part is what you'ld with pptx library, the non-commented part is with the win32com lib.

Maybe it can help, but the win32com lib is not friendly and doc is old by this time.

Enne31 commented 2 years ago

Upload an example pptx....

Sorry, i wish i could, but i've got no authority to do so... Hope one can simulate it with an empty presentations and a media-file, since this seems to happen with .mov files aswell