maennchen / ZipStream-PHP

:floppy_disk: PHP ZIP Streaming Library
https://maennchen.dev/ZipStream-PHP/
MIT License
1.74k stars 104 forks source link

S3 steaming results in broken ZIP file #284

Closed SimonThornett closed 10 months ago

SimonThornett commented 1 year ago

ZipStream-PHP version

2.4.0

PHP version

8.0

Constraints for Bug Report

Summary

As discussed in https://tracker.moodle.org/browse/MDL-72935 , downloading of files from S3 can result in a broken ZIP file with the CLI reporting (for example):

root@81e427baf6de:/var/www/site# unzip submissions.zip
--
warning [STAT0023_22-23.zip]: 5268 extra bytes at beginning or within zipfile
(attempting to process anyway)
file #1: bad zipfile offset (local header sig): 5268
(attempting to re-compensate)

A pull request was created https://github.com/moodle/moodle/compare/00f0613f99d2ca31f0b8e5b20267c0c3ba6873ea...SimonThornett:MDL-72935 that resolves the issue by reducing the CHUNKED_READ_BLOCK_SIZE, however it was rejected on the basis that the issue had already been reported here in #112 with a fix.

The fix there however was to set the fseek option for S3. But this is already in place here:

https://github.com/catalyst/moodle-tool_objectfs/blob/MOODLE_310_STABLE/classes/local/store/s3/client.php#L227

with the context used here:

https://github.com/catalyst/moodle-tool_objectfs/blob/MOODLE_310_STABLE/classes/local/store/object_file_system.php#L551

Current behavior

Streamed ZIP files from S3 can sometimes result in bad file headers

How to reproduce

Unfortunately we have only found 1 course that we are able to consistently replicate it on Moodle. I have attempted to replicate locally with new files, but was unable to do so myself.

Expected behavior

Streamed file opens without issue.

maennchen commented 1 year ago

@SimonThornett Can you share a corrupt file to debug the issue?

Eddy84 commented 11 months ago

We have the same issue that ZipStreams from S3 contain 1 file that is too large. task_100089_images_26.09.2023 13_46.zip

I can reliably reproduce the issue like so:

  1. Upload 4 images to S3
  2. Load each image once in the browser (they are loading as expected)
  3. Create a ZipStream with the 4 files, the zip file opens find and the file sizes look correct
  4. Create another ZipStream with the same 4 files, now you got one file that is too large
  5. The image can't be open in the browser anymore

We use Laravel and spatie media library, under the hood it uses league/flysystem.

maennchen commented 11 months ago

@Eddy84 Do I understand it correctly that you create two archive containing the same files in the same request / program execution?

Can you send me the working & the broken archives? Then I can compare them.

Also the code around generating the files would be useful.

Additionally the files in there would also be helpful to compare file sizes & content with the zip headers

maennchen commented 10 months ago

Closed because of inactivity. Feel free to re-open if this is still relevant.