richardlehane / siegfried

signature-based file format identification
http://www.itforarchivists.com/siegfried
Apache License 2.0
224 stars 30 forks source link

Scanning list of directories? #130

Closed harryjmoss closed 5 years ago

harryjmoss commented 5 years ago

I'm trying to pass a newline-separated list of directories to siegfried with the -f flag, but obviously run into the error:

file is of type directory; only regular files can be scanned

I can get around this for input lists with a few directories by piping the contents of the file to stdout, but this becomes unwieldy when the number of directories in the list approaches a few hundred - eventually I would like to profile the contents of around 10000 directories. Is there some functionality I'm missing where passing a list of directories in this way is possible or could this be added? Thanks!

richardlehane commented 5 years ago

thanks for posting this Harry. Let me have a think about this one & whether would be desirable to add as a new feature. Rather than pipe the file contents to stdout, which would have the undesirable effect of re-running the sf executable repeatedly, you could pipe the the list of filenames instead. I.e. you can do find . | sf -f - (now sf is running once but streaming a list of file names from the file command). So perhaps a command like: find $(cat my_list.txt) -type f | sf -f - would do the trick??

harryjmoss commented 5 years ago

Thanks for the quick reply! find $(cat my_list.txt) -type f | sf -f - definitely seems to work for this purpose and I'll use this going forward. My only hesitation was if this approach would still hold once the number of directories in the input files becomes more than a few hundred, or whether I'll be hit with an 'argument list too long' error (or similar). I had another thought that passing a list of directories might be useful for producing a DROID-style report that shows parent directories etc.