harvard-lts / fits

File Information Tool Set
http://fitstool.org
GNU Lesser General Public License v2.1
92 stars 46 forks source link

FITS Jhove incorrectly reports certain text files as invalid AIFF files #392

Closed sprater closed 9 months ago

sprater commented 9 months ago

When running FITS 1.6.0 on certain plain text files, the Jhove check fails, and the file is reported to be an invalid AIFF file:

 ./fits.sh -i ./misidentified-text.txt

<identity format="AIFF" mimetype="audio/x-aiff" toolname="FITS" toolversion="1.6.0">
      <tool toolname="Jhove" toolversion="1.26.1" />
</identity>
[...]
<filestatus>
    <well-formed toolname="Jhove" toolversion="1.26.1" status="SINGLE_RESULT">false</well-formed>
    <valid toolname="Jhove" toolversion="1.26.1" status="SINGLE_RESULT">false</valid>
    <message toolname="Jhove" toolversion="1.26.1" status="SINGLE_RESULT">File type in Form Chunk is not AIFF or AIFC severity=error offset=12</message>
</filestatus>

The other tools correctly identify the file as a text file.

I tried running the file through Jhove 1.26.1, using the same Jhove configuration as the one distributed in FITS 1.6.0; Jhove on its own identifies the file correctly.

$ ./jhove -m ASCII-hul ./misidentified-text.txt 

Jhove (Rel. 1.26.1, 2022-07-14)
 Date: 2024-02-23 15:56:19 CST
 RepresentationInformation: ./misidentified-text.txt
  ReportingModule: ASCII-hul, Rel. 1.4.2 (2022-04-22)
  LastModified: 2024-02-23 11:31:30 CST
  Size: 1701
  Format: ASCII
  Status: Well-Formed and valid
  MIMEtype: text/plain; charset=US-ASCII
  ASCIIMetadata: 
   LineEndings: LF 

The files we have that fail in this way all start with the word "FORM". Attached is the sample file that provokes the error, as well as the full output of FITS when run on this file.

fits-output.txt misidentified-text.txt

pwinckles commented 9 months ago

@sprater This is a jhove bug. I recommend opening a ticket with them.

Before FITS runs a file through jhove, it first executes the jhove signature check to determine which module to run. In this case, this produces:

./jhove -s ~/Downloads/misidentified-text.txt 
Jhove (Rel. 1.28.0, 2023-05-18)
 Date: 2024-02-23 16:21:56 CST
 RepresentationInformation: /home/pwinckles/Downloads/misidentified-text.txt
  ReportingModule: AIFF-hul, Rel. 1.6.2 (2022-04-22)
  LastModified: 2024-02-23 16:14:27 CST
  Size: 1761
  Format: AIFF
  Status: Well-Formed
  SignatureMatches:
   AIFF-hul
  MIMEtype: audio/x-aiff

So, jhove says to use the AIFF-hul, and that's what FITS does.

sprater commented 9 months ago

Thanks. And I see the bug is still in the current version of Jhove.

sprater commented 9 months ago

Jhove issue submitted: https://github.com/openpreserve/jhove/issues/902

As far as I am concerned, this ticket can be closed/withdrawn.