Open daveneiman opened 6 years ago
The '<' followed by 'Title' on the next line of the sample file appears to meet the heuristic of an HTML file (even though it is Plain Text) as described in edu.harvard.hul.ois.jhove.module.HtmlModule#checkSignatures(File file, InputStream stream, RepInfo info)
This seems to be a rare edge case that came up in the processing of our files.
Dev Effort
1D
Description
Not sure that this will be fixed but it feels like a good prompt to try to document JHOVE's criteria for identifying HTML from plain text and XML.
The application misidentifies a text file as mime type 'text/html' due to an open angle bracket '<' at the end of one line followed by the word 'Title' at the beginning of the following line. When the '<' is moved anywhere else in the file the mime type is 'text/plain'.
Here is the content from the file:
United States- Central Intelligence Agency*
The Mediterranean basin — Scale ll 6f500000 ; Lambert conformal conic
proj. (W 21°—E 60O/N 49°--N 20°). —
[Washington : Central Intelligence Agency* 1986]
1 map : col. ; 39 X 108 cm. Countries area—tinted Includes notes '•300342 (A05054) 6-86*"
1* Mediterranean iiegioa—Maps< Title
10 NOV 95 CSSH HWTlsl 87-691121
400263499.txt