richardlehane / siegfried

signature-based file format identification
http://www.itforarchivists.com/siegfried
Apache License 2.0
224 stars 30 forks source link

Scanning ZIPs (`-z`) fails if the file is piped into stdin #244

Closed max-moser closed 4 months ago

max-moser commented 9 months ago

Scanning archives with the -z flag works well when supplying the archive file via its filename, but it fails when the file is piped into stdin (- as filename).

Example

I created a ZIP file with three random files that I've had lying around.

Error when piping the ZIP file into stdin

mmoser@mx ~ $ ~/go/bin/sf -z -name othername.zip - < random.zip
[FILE] /home/mmoser/othername.zip
[ERROR] failed to decompress, got: zip: not a valid zip file
---
siegfried   : 1.10.1
scandate    : 2024-02-27T13:35:36+01:00
signature   : default.sig
created     : 2023-12-17T15:38:39+01:00
identifiers : 
  - name    : 'pronom'
    details : 'DROID_SignatureFile_V116.xml; container-signature-20231127.xml'
---
filename : 'othername.zip'
filesize : 0
modified : 0001-01-01T00:00:00Z
errors   : 'failed to decompress, got: zip: not a valid zip file'
matches  :
  - ns      : 'pronom'
    id      : 'x-fmt/263'
    format  : 'ZIP Format'
    version : 
    mime    : 'application/zip'
    class   : 'Aggregate'
    basis   : 'extension match zip; container match with trigger and default extension'
    warning : 

Works if the file name is supplied directly

mmoser@mx ~ $ ~/go/bin/sf -z -name othername.zip random.zip
---
siegfried   : 1.10.1
scandate    : 2024-02-27T13:35:31+01:00
signature   : default.sig
created     : 2023-12-17T15:38:39+01:00
identifiers : 
  - name    : 'pronom'
    details : 'DROID_SignatureFile_V116.xml; container-signature-20231127.xml'
---
filename : 'random.zip'
filesize : 15288
modified : 2024-02-27T13:35:18+01:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'x-fmt/263'
    format  : 'ZIP Format'
    version : 
    mime    : 'application/zip'
    class   : 'Aggregate'
    basis   : 'extension match zip; container match with trigger and default extension'
    warning : 
---
filename : 'random.zip#ck-style.css'
filesize : 71204
modified : 2021-09-30T17:21:06Z
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'x-fmt/224'
    format  : 'Cascading Style Sheet'
    version : 
    mime    : 'text/css'
    class   : 'Text (Structured)'
    basis   : 'extension match css'
    warning : 'match on extension only'
---
filename : 'random.zip#draft.json'
filesize : 5247
modified : 2024-01-29T10:43:24Z
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'fmt/817'
    format  : 'JSON Data Interchange Format'
    version : 
    mime    : 'application/json'
    class   : 
    basis   : 'extension match json'
    warning : 'match on extension only'
---
filename : 'random.zip#lol.py'
filesize : 293
modified : 2023-09-23T13:46:28Z
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'fmt/938'
    format  : 'Python Source Code File'
    version : 
    mime    : 
    class   : 'Text (Structured)'
    basis   : 'extension match py'
    warning : 'match on extension only'
richardlehane commented 8 months ago

thanks Max, I'll look into this. It may not be possible to get this unzipping working with streams (because the unzip routines may need to seek). If that's the case, perhaps will need to just fail early to say that can't use these flags together

richardlehane commented 5 months ago

this bug should also be fixed in the next release!