Error while loading the docx(KeyError: "There is no item named 'word/#_top' in the archive")

akash97715 commented 7 months ago

Hello Team we are using below code to load the document

from docx import Document

# Path to your DOCX file
docx_file_path = 'myfile.docx'

# Load the DOCX file
document = Document(docx_file_path)

# Example: Print all the text in the document
for para in document.paragraphs:
    print(para.text)

Getting below error:

KeyError                                  Traceback (most recent call last)~\AppData\Local\Temp\ipykernel_13540\1987488543.py in <module>      5       6 # Load the DOCX file----> 7 document = Document(docx_file_path)      8       9 # Example: Print all the text in the document~\Anaconda3\lib\site-packages\docx\api.py in Document(docx)     21     """     22     docx = _default_docx_path() if docx is None else docx---> 23     document_part = Package.open(docx).main_document_part     24     if document_part.content_type != CT.WML_DOCUMENT_MAIN:     25         tmpl = "file '%s' is not a Word file, content type is '%s'"~\Anaconda3\lib\site-packages\docx\opc\package.py in open(cls, pkg_file)    114     def open(cls, pkg_file):    115         """Return an |OpcPackage| instance loaded with the contents of `pkg_file`."""--> 116         pkg_reader = PackageReader.from_file(pkg_file)    117         package = cls()    118         Unmarshaller.unmarshal(pkg_reader, package, PartFactory)~\Anaconda3\lib\site-packages\docx\opc\pkgreader.py in from_file(pkg_file)     23         content_types = _ContentTypeMap.from_xml(phys_reader.content_types_xml)     24         pkg_srels = PackageReader._srels_for(phys_reader, PACKAGE_URI)---> 25         sparts = PackageReader._load_serialized_parts(     26             phys_reader, pkg_srels, content_types     27         )~\Anaconda3\lib\site-packages\docx\opc\pkgreader.py in _load_serialized_parts(phys_reader, pkg_srels, content_types)     51         sparts = []     52         part_walker = PackageReader._walk_phys_parts(phys_reader, pkg_srels)---> 53         for partname, blob, reltype, srels in part_walker:     54             content_type = content_types[partname]     55             spart = _SerializedPart(partname, content_type, reltype, blob, srels)~\Anaconda3\lib\site-packages\docx\opc\pkgreader.py in _walk_phys_parts(phys_reader, srels, visited_partnames)     84                 phys_reader, part_srels, visited_partnames     85             )---> 86             for partname, blob, reltype, srels in next_walker:     87                 yield (partname, blob, reltype, srels)     88 
~\Anaconda3\lib\site-packages\docx\opc\pkgreader.py in _walk_phys_parts(phys_reader, srels, visited_partnames)     84                 phys_reader, part_srels, visited_partnames     85             )---> 86             for partname, blob, reltype, srels in next_walker:     87                 yield (partname, blob, reltype, srels)     88 
~\Anaconda3\lib\site-packages\docx\opc\pkgreader.py in _walk_phys_parts(phys_reader, srels, visited_partnames)     79             reltype = srel.reltype     80             part_srels = PackageReader._srels_for(phys_reader, partname)---> 81             blob = phys_reader.blob_for(partname)     82             yield (partname, blob, reltype, part_srels)     83             next_walker = PackageReader._walk_phys_parts(~\Anaconda3\lib\site-packages\docx\opc\phys_pkg.py in blob_for(self, pack_uri)     81         Raises |ValueError| if no matching member is present in zip archive.     82         """---> 83         return self._zipf.read(pack_uri.membername)     84      85     def close(self):~\Anaconda3\lib\zipfile.py in read(self, name, pwd)   1470     def read(self, name, pwd=None):   1471         """Return file bytes for name."""-> 1472         with self.open(name, "r", pwd) as fp:   1473             return fp.read()   1474 
~\Anaconda3\lib\zipfile.py in open(self, name, mode, pwd, force_zip64)   1509         else:   1510             # Get info object for name-> 1511             zinfo = self.getinfo(name)   1512    1513         if mode == 'w':~\Anaconda3\lib\zipfile.py in getinfo(self, name)   1436         info = self.NameToInfo.get(name)
   1437         if info is None:
-> 1438             raise KeyError(
   1439                 'There is no item named %r in the archive' % name)
   1440 

KeyError: "There is no item named 'word/#_top' in the archive"

Let me know am i doing anything wrong, also it will be helpful if u provide some suggestion to resolve this issue

scanny commented 7 months ago

Sounds like a corrupted docx file. Maybe open it with Word or LibreOffice and save as a new name so it rewrites the file.

akash97715 commented 7 months ago

Hello, Thanks for you response. I tried saving with new name but still got the same error. I revalidated the docs it’s not corrupted

scanny commented 7 months ago

@akash97715 if you can send the file I'll take a look at it. Otherwise I just don't have enough to go on. I've never seen this error before and I've been at it for over a decade, so this is something of an edge case.

Do you know the provenance of the document? Was it generated by some package rather than being authored using Word or LibreOffice?

msr22 commented 6 months ago

@scanny I am facing the same issue. I tried renaming the file & saving it again but still getting the issue. I am able to open the file correctly in MS Word.

python-openxml / python-docx

Error while loading the docx(KeyError: "There is no item named 'word/#_top' in the archive") #1351