rikyoz / bit7z

A C++ static library offering a clean and simple interface to the 7-zip shared libraries.
https://rikyoz.github.io/bit7z
Mozilla Public License 2.0
627 stars 114 forks source link

How to extract entire content of the existing 7z file to memory? #18

Closed jincongcong closed 5 years ago

jincongcong commented 5 years ago

I found the codes to extract a file by an index. bit7z::Bit7zLibrary lib(L"7z.dll"); bit7z::BitExtractor extractor(lib, bit7z::BitFormat::SevenZip); std::vector out_buffer; unsigned index = 0; extractor.extract(L"test.7z", out_buffer, index);

So how to extract all the files in the archive and it's buffer to a std::map<std::wstring, std::vector >?

jincongcong commented 5 years ago

std::map<std::wstring, std::vector > the key means the file's path in the archive,the value means the file's buffer.

rikyoz commented 5 years ago

Hi! You also have to use the BitArchiveInfo class along with BitExtractor! Using it, in fact, you can iterate over the items inside the archive, retrieve the path of the files — i.e. the keys of the map — and then use BitExtractor to extract the single files, as in the following code example:

std::map< std::wstring, std::vector< byte_t > > result;
{
    std::wstring input_file = L"test.7z";
    BitArchiveInfo info( lib, input_file, BitFormat::SevenZip );
    BitExtractor extractor( lib, BitFormat::SevenZip );
    uint32_t items_count = info.itemsCount(); //number of items (folders + files) in the archive
    for ( uint32_t i = 0; i < items_count; ++i ) {
        bool isDir = info.getItemProperty( i, BitProperty::IsDir ).getBool();
        if ( isDir ) { continue; }  //ignoring folder items

        //getting the path of the file at index i
        std::wstring path = info.getItemProperty( i, BitProperty::Path ).getString();
        //extracting the file at index i to the corresponding buffer in the map
        extractor.extract( input_file, result[ path ], i );
    }
}

I hope to implement this in a more easy to use function in the next stable version!

jincongcong commented 5 years ago

Hi! You also have to use the BitArchiveInfo class along with BitExtractor! Using it, in fact, you can iterate over the items inside the archive, retrieve the path of the files — i.e. the keys of the map — and then use BitExtractor to extract the single files, as in the following code example:

std::map< std::wstring, std::vector< byte_t > > result;
{
    std::wstring input_file = L"test.7z";
    BitArchiveInfo info( lib, input_file, BitFormat::SevenZip );
    BitExtractor extractor( lib, BitFormat::SevenZip );
    uint32_t items_count = info.itemsCount(); //number of items (folders + files) in the archive
    for ( uint32_t i = 0; i < items_count; ++i ) {
        bool isDir = info.getItemProperty( i, BitProperty::IsDir ).getBool();
        if ( isDir ) { continue; }  //ignoring folder items

        //getting the path of the file at index i
        std::wstring path = info.getItemProperty( i, BitProperty::Path ).getString();
        //extracting the file at index i to the corresponding buffer in the map
        extractor.extract( input_file, result[ path ], i );
    }
}

I hope to implement this in a more easy to use function in the next stable version!

Thank you! But I tried this method and it took a lot of time! About ten minutes! There are 10000+ files in the test.7z, the size of test.7z is 3.39MB.

rikyoz commented 5 years ago

Probably the performance problem is due to the fact that the extract method of BitExtractor is called for each file in the archive: this method opens again the archive, it calls the 7-zip DLL extraction function, and it closes the archive. As you can see, repeating this for all the 10k and more files is not optimal, but that piece of code was meant to simply implement the functionality using the current version of bit7z, without modifying its code. Implementing this kind of extraction in a new function inside BitExtractor allows to optimize the accesses to the archive, as in the following code example, in which the archive is opened only once:

void BitExtractor::extract( const wstring& in_file, map< wstring, vector< byte_t > >& out_map ) {
    CMyComPtr< IInArchive > in_archive = openArchive( *this, mFormat, in_file );

    uint32_t number_items;
    in_archive->GetNumberOfItems( &number_items );

    uint32_t indices[] = { 0 };

    for ( uint32_t i = 0; i < number_items; ++i ) {
        BitPropVariant propvar;
        in_archive->GetProperty( i, kpidIsDir, &propvar );
        if ( propvar.getBool() ) { continue; } //ignore directories

        in_archive->GetProperty( i, kpidPath, &propvar ); //getting file path in the archive

        auto* extract_callback_spec = new MemExtractCallback( *this, in_archive, out_map[ propvar.getString() ] );

        indices[ 0 ] = i;

        CMyComPtr< IArchiveExtractCallback > extract_callback( extract_callback_spec );
        if ( in_archive->Extract( indices, 1, NExtract::NAskMode::kExtract, extract_callback ) != S_OK ) {
            throw BitException( extract_callback_spec->getErrorMessage() );
        }
    }
}

If you are using a custom build of bit7z, you could add this code to BitExtractor and see if there is some performance improvement! Actually, also this code can be further optimized: it calls the Extract method of the 7-zip DLL for each single file in the archive, instead of calling it only once using an array of all the indices of the files to be extracted; however this approach would require a substantial rewrite of the MemExtractCallback class, since at the moment it takes only one vector buffer in the constructor! Anyway, as I said I hope to implement the function in this way in the next version!

jincongcong commented 5 years ago

Thank you very much!