kiwilan / php-archive

PHP package to handle archives (.zip, .rar, .tar, .7z, .pdf) with unified API and hybrid solution (native/p7zip), designed to works with EPUB and CBA (.cbz, .cbr, .cb7, .cbt).
MIT License
6 stars 5 forks source link

Read Archive from string? #42

Open ghost opened 7 months ago

ghost commented 7 months ago

What happened?

Would it be possible to pass data to this package as a string? I am reading the first X amount of bytes from a partially downloaded file which is a string and not saved to disk. Currently I only see the read function but that only takes a disk path.

I am trying to list the files in the archive and read their CRC32 value.

Another thing that would be nice is the ability to set the path for 7zip. Since the p7zip is unmaintained there's the normal 7zip package now which comes with the 7zz binary.

Also, is there password support on the horizon some time? Like $archive->setPassword('password'); before trying to list/extract.

How to reproduce the bug

-

Package Version

Latest

PHP Version

8.3.2

Which operating systems does with happen with?

Linux

Notes

No response

ewilan-riviere commented 7 months ago

Thanks for these ideads, I will work on it!

ghost commented 7 months ago

Awesome!

I am trying to migrate from this fork https://github.com/DariusIII/rarinfo, which does not support passwords either. So I am calling unrar/unzip/7zz manually now which is kind of a pain in the ass.

The one thing that package does properly (most of the time) is being able to read archive contents without extracting (if there is no password) with the getArchiveFileList() function.

ewilan-riviere commented 7 months ago

To summary, I will try to add these features:

It's ok? I can't assure all these features but I will try to implement it.

ghost commented 7 months ago

Sounds great!

ewilan-riviere commented 7 months ago

About read an archive as string, a solution could be to copy archive into temporary directory and read it.

<?php

$contents = file_get_contents($path); // simulate the zip file content
$path = tempnam(sys_get_temp_dir(), 'zip'); // create a temporary file
file_put_contents($path, $contents); // write the content to the temporary file

$archive = Archive::read($path); // now we can read the zip file

We can imagine another method, like fromString().

<?php

$contents = file_get_contents($path);
$archive = Archive::fromString($contents);

Inside this method, $contents will be written to a temporary file and then read.

ghost commented 7 months ago

I was trying to keep it from disk and keep it in memory. This will save my SSDs in the long run :-)

fromString() would be perfect.

ewilan-riviere commented 7 months ago

I publish a beta version, you can test it and report any issue.

{
  "require": {
    "kiwilan/php-archive": "dev-main"
  }
}

Now you can use readFromString() to read an archive from a string.

$archive = Archive::readFromString($contents);

This method will try to detect the archive type from the string. If it fails, it could throw an exception. You can set manually the archive type using the third parameter.

$archive = Archive::readFromString($contents, extension: 'zip');

You can set a password for the archive using the second parameter.

Not work on Windows for RAR and 7z (WIP).

$archive = Archive::read($path, 'password');
$archive = Archive::readFromString($contents, 'password');

You can also manually set 7z binary path.

Not work on Windows (WIP).

$archive = Archive::read($path)->overrideBinaryPath($binary_path);
ewilan-riviere commented 7 months ago

Now password and override binary works on Windows.

ghost commented 7 months ago

Looks great. Will test this weekend or Monday.

ghost commented 7 months ago

Okay so I tried it on a partial RAR file (Size: 3,5 MB)

$test = Archive::readFromString($this->_tmpExtractPath, $this->_release->password, 'rar')->overrideBinaryPath($this->_7zipPath);

Archive: Error detecting extension from mime type, please add manually archive extension as third parameter of readFromString().

Running unrar -l on this file shows the contents of the RAR. Albeit it (obviously) throws an Unexpected end of archive error since it's not the full file I am reading. All good.

7Zip however can't read the file: Cannot open the file as archive. I was running 7zip 21.07, so I upgraded to the latest beta 24.00 (https://www.7-zip.org/download.html) and that could read this file. But readFromString() keeps throwing that same error.

It looks like it will always set $extension to null if the match() fails, therefore ignoring the $extension passed into the function as a parameter.

Also relying on Mime Type might give some issues. I have been relying on my own function in another bit of code for a while now to detect the archive type by the first few bytes of the file which seems to work pretty well. This could be used instead of mime-type or as a fallback perhaps:

public function detectArchiveType($filePath): string|bool
{
    $handle = fopen($filePath, 'rb');
    if (! $handle) {
        return 'Cannot open file';
    }

    $bytes = fread($handle, 16); // Read the first 16 bytes
    fclose($handle);

    $hexBytes = bin2hex($bytes);

    // Check for PAR2
    if (strpos($bytes, "PAR2\0PKT") === 0) {
        return 'PAR2';
    }

    // Combined regex for ZIP formats
    if (preg_match('/504b0304|504b0708/', $hexBytes)) {
        return 'ZIP';
    }

    // Check for RAR, including version 5
    if (preg_match('/526172211a07(0100)?/', $hexBytes, $matches)) {
        return isset($matches[1]) ? 'RAR5' : 'RAR';
    }

    // Check for TAR
    if (strpos($hexBytes, '7573746172') !== false) {
        return 'TAR';
    }

    // Check for 7z
    if (strpos($hexBytes, '377abcaf271c') !== false) {
        return '7z';
    }

    // Check for gzip
    if (substr($hexBytes, 0, 4) == '1f8b') {
        return 'GZIP';
    }

    // Check for bzip2
    if (substr($hexBytes, 0, 6) == '425a68') {
        return 'BZIP2';
    }

    // Check for SFV (simple heuristic). This is not a reliable method to detect SFV files.
    if (preg_match('/^;.*\r?\n;.*\r?\n[\w.-]+\s+[A-Fa-f0-9]{8}\r?\n/', $bytes)) {
        return 'SFV';
    }

    return false;
}

This might work more reliably. However there are more MIME Types for RAR:

ghost commented 7 months ago

Edited my above comment with why it fails to check mime type etc.

By the way. p7zip is old and not updated, best would be to use the official binaries from 7-zip.org 7zz

I removed the mime-type check to see if it worked then but looks like binary override also does not work:

sh: 1: 7z: not found