openpreserve / jhove

File validation and characterisation.
http://jhove.openpreservation.org
Other
171 stars 79 forks source link

jhove return 0 even errors happened #826

Open def-fun opened 1 year ago

def-fun commented 1 year ago

Thanks for jhove, it saved lots of time to check integrity of files stored in my backup hard disk.

I think jhove can be more powerful, in my opinion, which means unix like and scriptable.

jhove should return Non-zero value, so that we can make full use of jhove in shell/command line. For example, we can move bad files to tmp_dir firstly, then check them manually:

find ./backup/ -iname '*.pdf' -type f -exec bash -c \
    'jhove -m PDF-hul {} > /dev/null; \
    if [ $? -eq 0 ]; then \
        echo "[ok] {}"; \
    else \
        echo "[bad] {}"; \
        mv {} tmp_dir/; \
fi' \;

but jhove return 0 even errors happen at present.

# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.5 LTS
Release:    20.04
Codename:   focal
# echo anything > 1.pdf
# cat 1.pdf 
anything
# jhove -m PDF-hul 1.pdf 
Jhove (Rel. 1.26.1, 2022-07-14)
 Date: 2023-01-25 17:17:50 UTC
 RepresentationInformation: 1.pdf
  ReportingModule: PDF-hul, Rel. 1.12.3 (2022-04-22)
  LastModified: 2023-01-25 17:17:41 UTC
  Size: 9
  Format: PDF
  Status: Not well-formed
  ErrorMessage: No PDF header
   ID: PDF-HUL-137
   Offset: 0
  MIMEtype: application/pdf
# echo $?
0
# jhove -m GZIP-kb 1.pdf 
Jhove (Rel. 1.26.1, 2022-07-14)
 Date: 2023-01-25 17:18:20 UTC
 RepresentationInformation: 1.pdf
  ReportingModule: GZIP-kb, Rel. 0.2 (2022-04-22)
  LastModified: 2023-01-25 17:17:41 UTC
  Size: 0
  Format: GZIP
  Version: 4.3
  Status: Not well-formed
  ErrorMessage: ERROR_EXPECTED: Entity: GZip file, One or more records
  ErrorMessage: INVALID_DATA: Entity: GZip file, Unexpected trailing data!
  MIMEtype: application/gzip
  Records: 
# echo $?
0
# jhove eeeeeeeee
Jhove (Rel. 1.26.1, 2022-07-14)
 Date: 2023-01-25 17:09:33 UTC
 RepresentationInformation: eeeeeeeee
  Status: Unknown
  ErrorMessage: File not found
# echo $?
0
carlwilson commented 1 year ago

Hi @def-fun while I agree this is a good idea and JHOVE should return meaningful codes, we just need to be careful what that means. Conventionally return codes are used to signify a genuine issue in execution, and I'm not convinced that an invalid file is the best case for a non-zero return code. The "File not found" error might be though. Any exception thrown during execution should also lead to a non-zero code. I'm not saying this is a definitive answer, would be interested to read your thoughts.

ross-spencer commented 1 year ago

Another paradigm that is appearing are data oriented shells, spearheaded by tools like jq for example, but also appearing in terminals designed entirely for this: https://www.nushell.sh/ -- you wouldn't need to go that far -- with JHOVE, JHOVE+JSON Output+jq I am sure you could achieve the same.

jens-st commented 1 year ago

I ran into this issue as well on automated archiving processes with file validation tasks. Would it be possible to add a pure optional argument for the CLI version that can reflect the (validation) status of the checked file in the return code?