mwilliamson / python-mammoth

Convert Word documents (.docx files) to HTML
BSD 2-Clause "Simplified" License
811 stars 121 forks source link

BadZipFile error when converting an empty .docx created through context menu or 'New item' menu #88

Closed ylsun136 closed 4 years ago

ylsun136 commented 4 years ago

Conversion raises a BadZipFile error if the .docx file was created through context menu or 'New item' menu option in File Explorer, and has not been edited.

Environment

Windows 10 python 3.8.3 (anaconda)

To reproduce

  1. Create a new file empty.docx in File Explorer, using the right-click context menu->New->Microsoft Word Document, or the New item->Micorsoft Word Document option on the 'Home' ribbon.

  2. On the CLI, run mammoth <path to file>\empty.docx.

Additional info

The same error is rasied in calls to both mammoth.convert_to_html() and mammoth.extract_raw_text().

If the .docx was created new and saved without making edits in Microsoft Word, conversion works without error.

If the .docx has been edited in any way and saved, conversion works without error.

Traceback

 File "C:\ProgramData\Anaconda3\Scripts\mammoth.exe\__main__.py", line 9, in <module>
  File "c:\programdata\anaconda3\lib\site-packages\mammoth\cli.py", line 33, in main
    output_format=args.output_format,
  File "c:\programdata\anaconda3\lib\site-packages\mammoth\__init__.py", line 25, in convert
    kwargs["embedded_style_map"] = read_style_map(fileobj)
  File "c:\programdata\anaconda3\lib\site-packages\mammoth\docx\style_map.py", line 66, in read_style_map
    with open_zip(fileobj, "r") as zip_file:
  File "c:\programdata\anaconda3\lib\site-packages\mammoth\zips.py", line 9, in open_zip
    return _Zip(ZipFile(fileobj, mode))
  File "c:\programdata\anaconda3\lib\zipfile.py", line 1225, in __init__
    self._RealGetContents()
  File "c:\programdata\anaconda3\lib\zipfile.py", line 1292, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
mwilliamson commented 4 years ago

Could you provide an example file?

ylsun136 commented 4 years ago

Uhh..this interface does not allow me to attach an empty file :(

I can reproduce the error on a new empty .docx file created from the right-click menu in File Explorer: mammoth_error_source

mwilliamson commented 4 years ago

If it's an empty, zero-byte file, I'm not sure there's much Mammoth can do with that?

ylsun136 commented 4 years ago

Fair, I'll add a catch for zero file size then..thanks :))