rikyoz / bit7z

A C++ static library offering a clean and simple interface to the 7-zip shared libraries.
https://rikyoz.github.io/bit7z
Mozilla Public License 2.0
654 stars 119 forks source link

[Feature Request]: Recursive unpacking? #92

Open jindroush opened 2 years ago

jindroush commented 2 years ago

bit7z version

3.1.x

7-zip version

v19.00

7-zip DLL used

7z.dll

MSVC version

2019

Architecture

x86_64

Which version of Windows are you using?

Windows 10

Bug description

Is it possible to recursively unpack several layers of archives without actually dropping intermediary files on disk? And second question - is it possible to extract files with extractMatching, but to store output in flat directory, not several levels deep?

Situation: I have VirtualBox running Linux on Windows host. After turning off the VM, I need few files from the guest:

So, there is VDI file (layer 0), it contains MBR (layer 1), it contains EXT4 (layer2). In Ext4 I need to extract files from level 5 of nested directories. I have written test app, which extract these and it works - but VDI file is huge, so is MBR file and so is EXT4 file, unnecessarily tripling the disk space needed for extracting few megs of files, which I want to avoid.

Is my "hunch" right in a way that both of my questions could be only replied by 'directly calling 7z.dll'?

Steps to reproduce

No response

Expected behavior

No response

Relevant compilation output

No response

Code of Conduct

rikyoz commented 2 years ago

Hi!

Is it possible to recursively unpack several layers of archives without actually dropping intermediary files on disk?

Unfortunately, no. Or rather, not in your use case. In the case of a small archive, you might extract it to a std::istream or a buffer and extracting again from this latter. But it's not a feasible approach for big archives, obviously.

And second question - is it possible to extract files with extractMatching, but to store output in flat directory, not several levels deep?

This will actually be possible from the next version of the library, in which you can use the setRetainDirectories(false) method to disable the re-creation of the directory structure inside the archive when extracting it. But unfortunately, it's not possible in bit7z v3.1.x.

Situation: I have VirtualBox running Linux on Windows host. After turning off the VM, I need few files from the guest:

So, there is VDI file (layer 0), it contains MBR (layer 1), it contains EXT4 (layer2). In Ext4 I need to extract files from level 5 of nested directories. I have written test app, which extract these and it works - but VDI file is huge, so is MBR file and so is EXT4 file, unnecessarily tripling the disk space needed for extracting few megs of files, which I want to avoid.

This seems to be a problem similar to #90, only on a much bigger scale.

Is my "hunch" right in a way that both of my questions could be only replied by 'directly calling 7z.dll'?

I'm not entirely sure that 7-zip DLLs provide any immediate API for this kind of operation, but yes, probably it's achievable only by directly calling 7z.dll functions. I'm trying to study the 7-zip source code and the really poor documentation, but I still didn't find anything. Moreover, I'm still not entirely sure how bit7z might provide both a flexible and easy-to-use API for this kind of task. But it's a feature that I definitely want to implement, just probably not in the short term. Or at least, it all depends on how easily it can be implemented.

jindroush commented 2 years ago

I think it should work like this: instead of calling 'extract', the function deferredExtract would be called, returning some stream object with partial functionality - it'd implement read, and forward seek only. And such stream could be a input to another deferredExtract function. In the deepest level, the stream from deferredExtract would be put in some function dropDeferredToDisk (which would only copy from input stream to disk file).

rikyoz commented 2 years ago

I think it should work like this: instead of calling 'extract', the function deferredExtract would be called, returning some stream object with partial functionality - it'd implement read, and forward seek only. And such stream could be a input to another deferredExtract function. In the deepest level, the stream from deferredExtract would be put in some function dropDeferredToDisk (which would only copy from input stream to disk file).

Uhm yeah, I think this might be a good API! Thank you for the suggestion!

kenkit commented 2 years ago

I tried this within libarchive, it's not as easy as it looks, also some archive formats won't be supported as they don't provide seekable streams. It's an intresting feauture especially considering I implemented it ontop of some curl supported protocols, unfortunately extracting the second layer requruires retreiving the whole second file within the archive, you should look into my app under remote archive tab qtapp. You just input the url and can list upto two layers I think, you don't have to install the whole of it just qtapp. It's not perfect but it works to some extent. http://github.com/kenkit/neon_service/releases/latest

kenkit commented 2 years ago

image This should be the page of intrest. Just putting a raw zip file on http is a good example, google drive links require auth which is not working correctly for now, but e.g mediafire zip archive links will work, just copy from the download button.

rikyoz commented 2 years ago

I tried this within libarchive, it's not as easy as it looks, also some archive formats won't be supported as they don't provide seekable streams. It's an intresting feauture especially considering I implemented it ontop of some curl supported protocols, unfortunately extracting the second layer requruires retreiving the whole second file within the archive, you should look into my app under remote archive tab qtapp. You just input the url and can list upto two layers I think, you don't have to install the whole of it just qtapp. It's not perfect but it works to some extent. http://github.com/kenkit/neon_service/releases/latest

Interesting, I'll take a look into it for sure! Thank you!