mholt / archiver

DEPRECATED. Please use mholt/archives instead.
https://github.com/mholt/archives
MIT License
4.45k stars 392 forks source link

`archiver.Identify` fails on non-archive file: zlib: invalid header #406

Closed rgmz closed 5 months ago

rgmz commented 5 months ago

What version of the package or command are you using?

The latest release of v4, v4.0.0-alpha.8

What are you trying to do?

Recursively extract a .tar.gz file.

What steps did you take?

  1. Download https://github.com/kubernetes/git-sync/blob/b161f3f0c78b56f27188b4e4aabf672ba0b03706/vendor/github.com/google/licenseclassifier/licenses/licenses.db
  2. Run the following reproducer

    
    import (
        "context"
        "errors"
        "fmt"
        "os"
        "testing"
    
        "github.com/mholt/archiver/v4"
    )
    
    func TestTarGz(t *testing.T) {
        f, err := os.Open("/tmp/licenses.db")
        if err != nil {
            t.Fatal(err)
        }
        defer f.Close()
    
        format := archiver.CompressedArchive{
            Compression: archiver.Gz{},
            Archival:    archiver.Tar{},
        }
        err = format.Extract(context.Background(), f, nil, handler(t))
        if err != nil {
            t.Fatal(err)
        }
    }
    
    func handler(t *testing.T) func(ctx context.Context, file archiver.File) error {
        return func(ctx context.Context, file archiver.File) error {
            f, err := file.Open()
            if err != nil {
                t.Fatal(err)
            }
            defer f.Close()
    
            format, _, err := archiver.Identify(file.Name(), f)
            if err == nil {
                fmt.Printf("File '%s' is format '%s'\n", file.Name(), format.Name())
            } else if errors.Is(err, archiver.ErrNoMatch) {
                //fmt.Printf("File '%s' is not an archive\n", file.Name())
            } else {
                t.Errorf("Error identifying '%s' format: %v\n", file.Name(), err)
            }
    
            return nil
        }
    }

What did you expect to happen, and what actually happened instead?

I expected that archiver.Identify would return archiver.ErrNoMatch as the file isn't an archive. However, a different error is returned.

Error identifying 'X11.txt' format: matching zip: zlib: invalid header

How do you think this should be fixed?

I'm not sure, it depends on the cause

Please link to any related issues, pull requests, and/or discussion

https://github.com/trufflesecurity/trufflehog/issues/2928

Bonus: What do you use archiver for, and do you find it useful?

I use archiver via TruffleHog. It is quite useful in that regard. :)

rgmz commented 5 months ago

I tested the reproducer against HEAD and it doesn't return that error. It seems this was fixed in https://github.com/mholt/archiver/commit/24fa33e9b6a0b17e8418ffc90a94a06ab79bd5c2 (#386), which isn't included in the latest release.