zlib-ng / minizip-ng

Fork of the popular zip manipulation library found in the zlib distribution.
Other
1.24k stars 431 forks source link

LibreOffice zip file validation changes #817

Open rombust opened 1 day ago

rombust commented 1 day ago

I use the minizip-ng library to modify "content.xml" in a LibreOffice Draw document, in my C++ application

Since LibreOffice "LibreOffice_24.8.2_Win_x86-64", when opening the document, I get "The file "example.odg" is corrupt and therefore cannot be opened. LibreOffice can try to repair the file." . When the document is "repaired", the file opens as expected.

This is also in the daily development build v24.8 - 2024-11-06_09.10.55. It works in LibreOffice 7.6.7.2

To simplify the debugging, I can recreate the issue as follows (with setup and error checking removed)

err = mz_zip_reader_goto_first_entry(zip_reader);
do
   {
     err = mz_zip_reader_entry_get_info(zip_reader, &file_info);
     err = mz_zip_writer_copy_from_reader(zip_writer, zip_reader);
    err = mz_zip_reader_goto_next_entry(zip_reader);
  while(err != MZ_END_OF_LIST)

The example.odg was created by LibreOffice. If I use "minizip.exe" to extract the files, then "minizip.exe" to compress the files into "example_minizip_fixed.odg", my code works.

Looking at the list of these files, they differ. Notice the Libreoffice version (example.odg), "Attribs" are set to 0


C:\tmp>minizip -l  example.odg
     Packed     Unpacked Ratio Method   Attribs Date     Time  CRC-32     Name
      ------     -------- ----- ------   ------- ----     ----  ------     ----
          43           43  100% stored         0 11-06-24 10:08 c42e039f   mimetype
           0            0    0% stored         0 11-06-24 10:08 00000000   Configurations2/menubar/
           0            0    0% stored         0 11-06-24 10:08 00000000   Configurations2/progressbar/
           0            0    0% stored         0 11-06-24 10:08 00000000   Configurations2/popupmenu/
           0            0    0% stored         0 11-06-24 10:08 00000000   Configurations2/toolbar/
           0            0    0% stored         0 11-06-24 10:08 00000000   Configurations2/statusbar/
           0            0    0% stored         0 11-06-24 10:08 00000000   Configurations2/images/Bitmaps/
           0            0    0% stored         0 11-06-24 10:08 00000000   Configurations2/floater/
           0            0    0% stored         0 11-06-24 10:08 00000000   Configurations2/toolpanel/
           0            0    0% stored         0 11-06-24 10:08 00000000   Configurations2/accelerator/
        3000        19644   15% deflate         0 11-06-24 10:08 55f98da4   styles.xml
        1358         7360   18% deflate         0 11-06-24 10:08 af4b7d1c   content.xml
        1068         1068  100% stored         0 11-06-24 10:08 fcd7adaa   Thumbnails/thumbnail.png
        2552        50225    5% deflate         0 11-06-24 10:08 0283cbd5   settings.xml
         456         1094   41% deflate         0 11-06-24 10:08 9d6f981a   meta.xml
         292          965   30% deflate         0 11-06-24 10:08 73935a56   META-INF/manifest.xml

C:\tmp>minizip -l  example_minizip_fixed.odg
     Packed     Unpacked Ratio Method   Attribs Date     Time  CRC-32     Name
      ------     -------- ----- ------   ------- ----     ----  ------     ----
           0            0    0% stored        10 11-06-24 10:27 00000000   Configurations2/
           0            0    0% stored        10 11-06-24 10:27 00000000   Configurations2/accelerator/
           0            0    0% stored        10 11-06-24 10:27 00000000   Configurations2/floater/
           0            0    0% stored        10 11-06-24 10:27 00000000   Configurations2/images/
           0            0    0% stored        10 11-06-24 10:27 00000000   Configurations2/images/Bitmaps/
           0            0    0% stored        10 11-06-24 10:27 00000000   Configurations2/menubar/
           0            0    0% stored        10 11-06-24 10:27 00000000   Configurations2/popupmenu/
           0            0    0% stored        10 11-06-24 10:27 00000000   Configurations2/progressbar/
           0            0    0% stored        10 11-06-24 10:27 00000000   Configurations2/statusbar/
           0            0    0% stored        10 11-06-24 10:27 00000000   Configurations2/toolbar/
           0            0    0% stored        10 11-06-24 10:27 00000000   Configurations2/toolpanel/
        1354         7360   18% deflate        80 11-06-24 10:08 af4b7d1c   content.xml
           0            0    0% stored        10 11-06-24 10:27 00000000   META-INF/
         290          965   30% deflate        80 11-06-24 10:08 73935a56   META-INF/manifest.xml
         459         1094   41% deflate        80 11-06-24 10:08 9d6f981a   meta.xml
          43           43  100% deflate        80 11-06-24 10:08 c42e039f   mimetype
        2520        50225    5% deflate        80 11-06-24 10:08 0283cbd5   settings.xml
        3002        19644   15% deflate        80 11-06-24 10:08 55f98da4   styles.xml
           0            0    0% stored        10 11-06-24 10:27 00000000   Thumbnails/
         659         1068   61% deflate        80 11-06-24 10:08 fcd7adaa   Thumbnails/thumbnail.png

Looking into


int32_t mz_zip_writer_entry_open(void *handle, mz_zip_file *file_info) {
. . .
   if (mz_zip_attrib_is_dir(writer->file_info.external_fa, writer->file_info.version_madeby) != MZ_OK) 
. . . 

The external_fa is set to 0, therefore "mz_zip_attrib_is_dir" will never return MZ_OK. Thus the hash is always calculated. Note, mz_zip_writer_copy_from_reader also uses mz_zip_attrib_is_dir .

I tried forcing the correct Attribs, thinking this was the source of the problem. However LibreOffice still reported the file is corrupt.

Has anyone else seen this?

Observation 1) minizip.exe stores "Configurations2/" name in the above example. I am unsure if that's an issue. Observation 2) The output.odg file that my code (using mz_zip_writer_copy_from_reader) creates. Using minizip.exe, If I extract it, then recompress it. LibreOffice will open the file without issues.

pmqs commented 18 hours ago

It's been a while, but my recollection of the equivalent issue with Office files, which also use zip behind the scenes, is that the exact layout of the zipfile needs to be maintained. So things like the order the files are stored in the zip will matter. Whether they are compressed or stored and whether the file is stored streamed or not streamed.

In this case the two things that stand out are the order of the files and the presence of the directory entries (Configurations2/ and Thumbnails/.

Whether one of these is the root-case here is difficult to tell definitively without knowing ho Libre handles zip files -- I just know that the equivalent Office file were very picky about the structure of the zipfile.

You could also try running zipdetails on the before & after zipfiles to see what other internal metadata is different.

rombust commented 17 hours ago

Thank you. I will have a look at that. "zipdetails" will be helpful.

I have tracked down to when LibreOffice changed (downloading and debugging source)


package/source/zippackage/ZipPackage.cxx 
32cad89 package: ZipPackage: add additional check for entries STORED with by Michael Stahl · 3 months ago

In function they added below, the "entry STORED with data descriptor but not encrypted" exception is thrown.


void ZipPackage::checkZipEntriesWithDD()
{
    if (!m_bForceRecovery)
    {
        ZipEnumeration entries{m_pZipFile->entries()};
        while (entries.hasMoreElements())
        {
            ZipEntry const& rEntry{*entries.nextElement()};
            if ((rEntry.nFlag & 0x08) != 0 && rEntry.nMethod == STORED)
            {
                uno::Reference<XPropertySet> xStream;
                getByHierarchicalName(rEntry.sPath) >>= xStream;
                if (!xStream->getPropertyValue("WasEncrypted").get<bool>())
                {
                    SAL_INFO("package", "entry STORED with data descriptor but not encrypted: \"" << rEntry.sPath << "\"");
                    throw ZipIOException(
                        THROW_WHERE
                        "entry STORED with data descriptor but not encrypted");
                }
            }
        }
    }
}

I will investigate. Thank you