maennchen / ZipStream-PHP

:floppy_disk: PHP ZIP Streaming Library
https://maennchen.dev/ZipStream-PHP/
MIT License
1.74k stars 104 forks source link

Document default options for Zip64 & Zero Header #267

Closed mkalkbrenner closed 1 year ago

mkalkbrenner commented 1 year ago

ZipStream-PHP version

3.0.2

PHP version

8.1.16

Constraints for Bug Report

Summary

Drupal's Search API Solr module uses zipstream to upload config-set zip files to Solr (or ZooKeeper). It perfectly worked with 1.x and 2.x versions for years. But it doesn't work anymore with 3.0.2 or the main branch. zipstream is not used to do HTTP, but to write a zip file to disk. This zip file then gets uploaded using https://github.com/solariumphp/solarium The zip file itself seems to be ok and could be opened and extracted using other zip clients. But the client built into Java seems to have an issue with files created with zipstream 3.x.

Current behavior

The issue seems to be about the file size or the size reported within the zip file: java.util.zip.ZipException: invalid entry size (expected 0 but got 24553 bytes)

If I set defaultEnableZeroHeader: false, the error message changes, but still exists: java.util.zip.ZipException: invalid entry size (expected 4294967295 but got 24553 bytes)

How to reproduce

drush search-api-solr:upload-configset -v --numShards=1

Expected behavior

The zip files produced with 2.4.0 still work. 3.0.x should be able to produce zip files that could be parsed by java.util.zip

NicolasCARPi commented 1 year ago

Did you set sendHttpHeaders: false, in the constructor? See https://maennchen.dev/ZipStream-PHP/guide/Options.html

mkalkbrenner commented 1 year ago

Setting sendHttpHeaders to true or false doesn't change anything. As mentioned above, zipstream just generates the zip file:

$file = fopen($file_name, 'w+b');
$zip = new ZipStream(outputStream: $file, defaultEnableZeroHeader: FALSE);

...

foreach ($files as $name => $content) {
      $zip->addFile($name, $content);
}

$zip->finish();

fclose($file);

This zip file then gets uploaded embedded into a Solr command using curl.

maennchen commented 1 year ago

@mkalkbrenner Can you open the archive with another unarchiving tool or is it corrupted completely?

mkalkbrenner commented 1 year ago

@maennchen as metioned above, the archive could be opened using other tools. And the files in it are readable and not corrupted in any way.

maennchen commented 1 year ago

@mkalkbrenner Sorry, you already wrote that.

Would you be able to share both a v2 and a v3 file with the same contents so we can compare?

mkalkbrenner commented 1 year ago

I'll prepare some files ...

mkalkbrenner commented 1 year ago

configset_v3.0.2_zeroHeader.zip configset_v3.0.2.zip configset_v2.4.0.zip

mkalkbrenner commented 1 year ago

I can reproduce it using curl on the command line:

curl -X POST --header "Content-Type:application/octet-stream" --data-binary @configset_v3.0.2_zeroHeader.zip "http://localhost:8983/solr/admin/configs?action=UPLOAD&name=myConfigSet1"
{
  "responseHeader":{
    "status":500,
    "QTime":13},
  "error":{
    "msg":"invalid entry size (expected 0 but got 24553 bytes)",
    ...
curl -X POST --header "Content-Type:application/octet-stream" --data-binary @configset_v3.0.2.zip "http://localhost:8983/solr/admin/configs?action=UPLOAD&name=myConfigSet2"
{
  "responseHeader":{
    "status":500,
    "QTime":11},
  "error":{
    "msg":"invalid entry size (expected 4294967295 but got 24553 bytes)",
    ...
curl -X POST --header "Content-Type:application/octet-stream" --data-binary @configset_v2.4.0.zip "http://localhost:8983/solr/admin/configs?action=UPLOAD&name=myConfigSet3"
{
  "responseHeader":{
    "status":0,
    "QTime":83}}
mkalkbrenner commented 1 year ago

I tried the main branch now and the new function addFileFromCallback() and provided a self-calculated exactSizefor every file. The resulting zip file could be read by different zip clients. But Java (Solr) still doesn't accept it. But the error message is different:

curl -X POST --header "Content-Type:application/x-zip" --data-binary @configset_v3.1.0beta_callback_exactSize.zip "http://localhost:8983/solr/admin/configs?action=UPLOAD&name=myConfigSet7"
{
  "responseHeader":{
    "status":500,
    "QTime":12},
  "error":{
    "msg":"invalid stored block lengths",

configset_v3.1.0beta_callback_exactSize.zip

mkalkbrenner commented 1 year ago

In order to unblock a Drupal 10.1.x compatible release of https://www.drupal.org/project/search_api_solr , can we do a 2.4.1 release including https://github.com/maennchen/ZipStream-PHP/pull/268 ? That would also solve dependency issues for other modules at the moment, for example https://github.com/PHPOffice/PhpSpreadsheet doesn't have a v3.0.2 compatible release yet.

maennchen commented 1 year ago

There is for sure something fishy going on:

v2.4.0 zip

       |00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19|0123456789abcdef0123456789|.{}: <stdin> (zip)
0x00000|50 4b 03 04 14 00 00 00 08 00 5c 7b ce 56 63 ba 00 70 53 05 00 00 e9 5f 00 00|PK........\{.Vc..pS...._..|  local_files[0:27]:
*      |until 0x21858.7 (137305)                                                     |                          |
0x21840|                                                                           50|                         P|  central_directories[0:27]:
0x2185a|4b 01 02 03 06 14 00 00 00 08 00 5c 7b ce 56 63 ba 00 70 53 05 00 00 e9 5f 00|K..........\{.Vc..pS...._.|
*      |until 0x21eee.7 (1686)                                                       |                          |
0x21eda|                                                               50 4b 05 06 00|                     PK...|  end_of_central_directory_record{}:
0x21ef4|00 00 00 1b 00 1b 00 96 06 00 00 59 18 02 00 00 00|                          |...........Y.....|        |

v3.0.2 zip

       |00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19|0123456789abcdef0123456789|.{}: <stdin> (zip)
       |                                                                             |                          |  error: zip: RawLen(compressed): failed at position 56 (read size 0 seek pos 0): outside buffer
0x00000|50 4b 03 04 2d 00 00 08 08 00 0a 7c ce 56 63 ba 00 70 ff ff ff ff ff ff ff ff|PK..-......|.Vc..p........|  local_files[0:1]:
*      |until 0x37.7 (56)                                                            |                          |
0x00034|            ed 5c 6d 4f e3 38 10 fe ce af c8 e5 3b 50 56 b7 77 5f e8 4a 7b 85|    .\mO.8......;PV.w_.J{.|  gap0: raw bits
0x0004e|22 74 c0 56 c0 8a 93 8e d3 ca 24 d3 d6 5a c7 ce 39 0e d0 fb f5 37 76 5e 80 b4|"t.V......$..Z..9....7v^..|
*      |until 0x2190d.7 (137430)                                                     |                          |
0x218f6|                                                                        50 4b|                        PK|  central_directories[0:27]:
0x21910|01 02 03 06 2d 00 00 08 08 00 0a 7c ce 56 63 ba 00 70 53 05 00 00 e9 5f 00 00|....-......|.Vc..pS...._..|
*      |until 0x21fa3.7 (1686)                                                       |                          |
0x21f90|                                                            50 4b 05 06 00 00|                    PK....|  end_of_central_directory_record{}:
0x21faa|00 00 1b 00 1b 00 96 06 00 00 0e 19 02 00 00 00|                             |................|   

I'm still investigating what's going on.

maennchen commented 1 year ago

Ok, the v3.0.2 zip contains a 0x5653 header (ExtendedInformationExtraField in ZipStream), maybe solr chokes on that.

That field should only be enabled if either a UTF-8 Name, UTF-8 Comment or a Zero Header is active. Neither makes sense for your file.

       |                                                                             |                          |      extra_fields[0:1]:
       |                                                                             |                          |        [0]{}: extra_field
0x00034|53 56                                                                        |SV                        |          header_id: 0x5653
0x00034|      00 00                                                                  |  ..                      |          data_size: 0
       |                                                                             |                          |          data: raw bits

Can you try commenting out lines 183-186 in File.php?

maennchen commented 1 year ago

(for anyone trying to replicate my investigation: I'm using fq which is extremely helpful to debug binary files.)

maennchen commented 1 year ago

Ah, found it, should've seen that directly.

Try setting enableZip64: false when creating the ZipStream instance.

mkalkbrenner commented 1 year ago

enableZip64: false, defaultEnableZeroHeader: false

This combination made it work. Thanks for your support!

But I must admit, that this API is less intuitive as before. And it might make sense to add some hints in the migration section of the README.

maennchen commented 1 year ago

@mkalkbrenner One of the issues of this lib is that error handling is quite hard since 99% of the users are sending a browser reaponse and the error is just written into the zip file.

Therefore it made sense to enable Zip64 by default since it can’t be done halfway through and would result in an overflow.

Similarly, a lot of users try to stream out file contents and the zero option is what makes this efficient.

This choice should be documented properly however, both in the Migration information as well as in the API docs.

A PR for that would be very welcome.

mkalkbrenner commented 1 year ago

search_api_solr uses the stream functionality as well when users are downloading a Solr config-set via the user interface instead of using the drush command line tool. Some hosters or Solr service providers offer a possibility to upload and deploy these zip files manually to their service and they deploy them. I assume that this will fail then, too. Therefor I added enableZip64: false, defaultEnableZeroHeader: false to the streaming part as well.

maennchen commented 1 year ago

@mkalkbrenner FYI: PhpSpreadsheet just released a version compatible with v3.

maennchen commented 1 year ago

@mkalkbrenner On further inspection, Zip64 was already enabled in v2. The difference is that the extra headers were only used if zeroHeader && zip64 while v3 was doing zeroHeader || zip64 and therefore added the header even if it was not needed.

Can you verify if it now works as well with #269?

Given that, I think the migration instructions (https://github.com/maennchen/ZipStream-PHP#archive-options) are clear enough and this ticket can be closed.