maennchen / ZipStream-PHP

:floppy_disk: PHP ZIP Streaming Library
https://maennchen.dev/ZipStream-PHP/
MIT License
1.76k stars 105 forks source link

Add file to existing archive #262

Open maennchen opened 1 year ago

maennchen commented 1 year ago

Description

Discussed in https://github.com/maennchen/ZipStream-PHP/discussions/261

Originally posted by **jhammer** May 19, 2023 Hi. Love ZipStream. We have a use case where we’d like to read a ZIP archive that is already on disk and stream it out to the user, but add file(s) to the stream as it goes out. The original ZIP archive on disk should remain untouched. We do not need to read or modify any of the files that are already in the stream. The `ZipStream` constructor could be modified to take an optional `inputPath` or `inputStream` parameter. No other public API changes would be necessary. We would be happy to sponsor development of this feature if you think it is feasible.
posted by **maennchen** May 19, 2023 Hmm. I think it should be possible. I think appending files is a lot easier since the existing file headers do not have to be recomputed. To do so, we‘d have to be able to parse header entries and skip existing files in the inputStream. As soon as we reach the central directory header, we‘d have to parse it, record its contents, write our own files and then emmit a new merged end on central directory record. Another possibility to an inputStream would be a addFilesFromZip[Stream|Path] function. That would allow adding multiple zip contents. The thing I‘m most worried about with this feature is that we‘d potentially read zip files which are using extensions / functionalities that this library does not support that might cause problems. We‘d probably have to be quite strict in what we parse initially. One more option would be to use PHPs internal ZipArchive class and just loop its contents. But I don’t think that is what you were looking for. I‘ll convert this to an issue to discuss the details of a possible implementation.
jhammer commented 1 year ago

Thanks for the reply! Two thoughts:

  1. I agree that your proposed addFilesFromZip… method is a better design.
  2. I had not considered looping over ZipArchive because I didn’t think there was a way to read the raw stream and write it to ZipStream without the overhead of decompressing/recompressing. However, your suggestion prompted me to look again. It seems like ZipArchive can provide the raw stream. If ZipStream provided API to add a file using a raw stream (i.e. to avoid decompressing/recompressing), I think that would suffice.
maennchen commented 1 year ago

@jhammer

Good point about accessing the raw stream. Where did you see that it supports that? If not, this approach would mean that a compressed file would be uncompressed and recompressed when adding it. I don't think that would fulfill your need, right?

Probably a better way would be to just read the zip file from a stream and just forward all file headers / contents & record their position. As soon as the end of central directory header is reached, the entries need to be parsed, the location rewritten and then we continue as normal.

I would be both open to support someone developing this feature and receive a PR or also to implement it myself provided somebody sponsors my time. If you're interested in that, send me an email for the details. (Email is public in my profile)