Open carlwilson opened 8 years ago
The last I checked on that was several years ago.
SourceForge: https://sourceforge.net/p/jhove/bugs/5/
Related to https://github.com/daitss/core/issues/714
PREMIS/METS asks for external schemas to be validated against. We can ask the JHOVE schema to do the same by setting processContents="strict"
, so I created a strict version. Attached.
We then have the problem of being able to locate the external schema to validate against. I noticed FCLA referenced two versions of these documents. I do not know if we can reference the schema locations inline, so I think we have to change them in the global header to:
<?xml version="1.0" encoding="utf-8"?>
<jhove
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:aes="http://www.aes.org/audioObject"
xmlns="http://schema.openpreservation.org/ois/xml/ns/jhove"
xsi:schemaLocation="http://schema.openpreservation.org/ois/xml/ns/jhove
file:///home/user/.../jhove-strict.xsd
http://www.aes.org/tcf http://schema.fcla.edu/tcf.xsd
http://www.aes.org/audioObject http://schema.fcla.edu/audioObject.xsd"
I have attached two copies of the AES schemas to run locally.
NB. Can they be hosted on openpreservation.org so that they are collected together?
Then the fun starts! There are a raft of validation errors trying to validate the AES based segment of the XML.
I haven't a 1.02b version of the audioOutput schema to validate against, so let's look at the changes we need for 1.03b:
Original from JHOVE:
<property>
<name>AESAudioMetadata</name>
<values arity="Scalar" type="AESAudioMetadata">
<value>
<aes:audioObject xmlns:aes="http://www.aes.org/audioObject"
xmlns:tcf="http://www.aes.org/tcf"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
ID="J4"
analogDigitalFlag="FILE_DIGITAL"
disposition="Validated by JHOVE"
schemaVersion="1.02b">
<aes:format specificationVersion="1.3 (1989-01-04)">AIFF</aes:format>
<aes:audioDataEncoding>PCM</aes:audioDataEncoding>
<aes:byteOrder>BIG_ENDIAN</aes:byteOrder>
<aes:firstSampleOffset>98</aes:firstSampleOffset>
<aes:use useType="OTHER" otherType="JHOVE_validation"/>
<aes:primaryIdentifier identifierType="FILE_NAME">/home/user/.../aiff-untitled.aiff</aes:primaryIdentifier>
<aes:face direction="NONE" ID="J3" audioObjectRef="J4" label="Face">
<aes:timeline>
<tcf:startTime tcf:frameCount="30"
tcf:timeBase="1000"
tcf:videoField="FIELD_1"
tcf:countingMode="NTSC_NON_DROP_FRAME">
<tcf:hours>0</tcf:hours>
<tcf:minutes>0</tcf:minutes>
<tcf:seconds>0</tcf:seconds>
<tcf:frames>0</tcf:frames>
</tcf:startTime>
</aes:timeline>
<aes:region ID="J2" formatRef="J1" faceRef="J3" label="BuiltByJHOVE">
<aes:timeRange>
<tcf:startTime tcf:frameCount="30"
tcf:timeBase="1000"
tcf:videoField="FIELD_1"
tcf:countingMode="NTSC_NON_DROP_FRAME">
<tcf:hours>0</tcf:hours>
<tcf:minutes>0</tcf:minutes>
<tcf:seconds>0</tcf:seconds>
<tcf:frames>0</tcf:frames>
</tcf:startTime>
</aes:timeRange>
<aes:numChannels>2</aes:numChannels>
<aes:stream ID="J90" label="JHOVE" faceRegionRef="J2">
<aes:channelAssignment channelNum="0" mapLocation="LEFT"/>
</aes:stream>
<aes:stream ID="J91" label="JHOVE" faceRegionRef="J2">
<aes:channelAssignment channelNum="1" mapLocation="RIGHT"/>
</aes:stream>
</aes:region>
</aes:face>
<aes:formatList>
<aes:formatRegion ID="J1">
<aes:bitDepth>16</aes:bitDepth>
<aes:sampleRate>44100</aes:sampleRate>
</aes:formatRegion>
</aes:formatList>
</aes:audioObject>
</value>
</values>
</property>
Fixed-up (with changes needed to validate correctly):
<property>
<name>AESAudioMetadata</name>
<values arity="Scalar" type="AESAudioMetadata">
<value>
<aes:audioObject xmlns:aes="http://www.aes.org/audioObject"
xmlns:tcf="http://www.aes.org/tcf"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
ID="J4" analogDigitalFlag="FILE_DIGITAL"
disposition="Validated by JHOVE"
schemaVersion="1.03b">
<aes:format specificationVersion="1.3 (1989-01-04)">AIFF</aes:format>
<aes:audioDataEncoding>PCM</aes:audioDataEncoding>
<aes:byteOrder>BIG_ENDIAN</aes:byteOrder>
<aes:firstSampleOffset>98</aes:firstSampleOffset>
<aes:use useType="OTHER" otherType="JHOVE_validation"/>
<aes:primaryIdentifier identifierType="FILE_NAME">/home/user/.../aiff-untitled.aiff</aes:primaryIdentifier>
<aes:face direction="NONE" ID="J3" audioObjectRef="J4" label="Face">
<aes:timeline>
<tcf:startTime frameCount="30"
timeBase="1000"
videoField="FIELD_1"
countingMode="NTSC_NON_DROP_FRAME">
<tcf:hours>0</tcf:hours>
<tcf:minutes>0</tcf:minutes>
<tcf:seconds>0</tcf:seconds>
<tcf:frames>0</tcf:frames>
<tcf:samples sampleRate="48000">
<tcf:numberOfSamples>999999</tcf:numberOfSamples>
</tcf:samples>
<tcf:filmFraming xsi:type="tcf:palFilmFramingType" framing="NOT_APPLICABLE"/>
</tcf:startTime>
</aes:timeline>
<aes:region ID="J2" formatRef="J1" faceRef="J3" label="BuiltByJHOVE">
<aes:timeRange>
<tcf:startTime frameCount="30" timeBase="1000" videoField="FIELD_1" countingMode="NTSC_NON_DROP_FRAME">
<tcf:hours>0</tcf:hours>
<tcf:minutes>0</tcf:minutes>
<tcf:seconds>0</tcf:seconds>
<tcf:frames>0</tcf:frames>
<tcf:samples sampleRate="S48000">
<tcf:numberOfSamples>999999</tcf:numberOfSamples>
</tcf:samples>
<tcf:filmFraming xsi:type="tcf:palFilmFramingType" framing="NOT_APPLICABLE"/>
</tcf:startTime>
</aes:timeRange>
<aes:numChannels>2</aes:numChannels>
<aes:stream ID="J90" label="JHOVE" faceRegionRef="J2">
<aes:channelAssignment channelNum="0" mapLocation="LEFT"/>
</aes:stream>
<aes:stream ID="J91" label="JHOVE" faceRegionRef="J2">
<aes:channelAssignment channelNum="1" mapLocation="RIGHT"/>
</aes:stream>
</aes:region>
</aes:face>
<aes:formatList>
<aes:formatRegion label="LABEL" ownerRef="J1" ID="J1">
<aes:bitDepth>16</aes:bitDepth>
<aes:sampleRate>44100</aes:sampleRate>
</aes:formatRegion>
</aes:formatList>
</aes:audioObject>
</value>
</values>
</property>
The primary changes, are the way the namespaces are referenced on attributes (I don't know if there is another way to use them like in the original JHove output but the validator complained). And then there are additional sequence requirements in the 1.03b schema (filmFraming
and samples
are two such examples):
<aes:timeline>
<tcf:startTime frameCount="30"
timeBase="1000"
videoField="FIELD_1"
countingMode="NTSC_NON_DROP_FRAME">
<tcf:hours>0</tcf:hours>
<tcf:minutes>0</tcf:minutes>
<tcf:seconds>0</tcf:seconds>
<tcf:frames>0</tcf:frames>
<tcf:samples sampleRate="48000">
<tcf:numberOfSamples>999999</tcf:numberOfSamples>
</tcf:samples>
<tcf:filmFraming xsi:type="tcf:palFilmFramingType" framing="NOT_APPLICABLE"/>
</tcf:startTime>
</aes:timeline>
NB. Some of these are just placeholder values. They're unlikely to be accurate.
Additional changes are needed to the audioOutput schema as well where it uses an xlink:simpleLink
type, where the w3c didn't maintain compatibility with previous specifications and now simpleLink
is simpleAttr
.
Ref: https://www.spatineo.com/ogc-w3c-xlink-transition-a-potential-validity-breaker/
<xsd:complexType name="locStringType">
<xsd:attributeGroup ref="xlink:simpleLink" />
</xsd:complexType>
Becomes:
<xsd:complexType name="locStringType">
<xsd:attributeGroup ref="xlink:simpleAttr" />
</xsd:complexType>
Once all of these changes are made, we can get the JHOVE output to validate and validate against external schemas as well.
I've attached an original and modified version of the XML below:
original-and-modified-jhovexml.zip
And I've two sample files to generate this output. I'm happy to add these to the OPF Format Corpus sometime in the next few weeks.
NTSC_NON_DROP_FRAME
seems like a bit of a smell here for audio only?)Hi @ross-spencer, I now think that I've been here fairly recently from another direction, namely this PR: https://github.com/openpreserve/jhove/pull/357 which is open and I suspect it fixes some of this. I now remember I got pretty deep in the leadup to 1.22 and then bottled it. The unpublished schema rings a bell. Will add myself to assignation and link the PR.
Ah!! Okay, A brief glance the PR definitely looks to clean the logic up a bit. It'll be interesting to compare the output.
Dev Effort
1D
Description
Both the WAVE and AIFF modules embed audio metadata in AES format without providing a schema. One of the produced elements make use of
xsi:type, <tcf:filmFraming tcf:framing="NOT_APPLICABLE" xsi:type="tcf:ntscFilmFramingType"/>
.Because JHOVE schema does not validate embedded xml (
processContents="skip"
), the use ofxsi:type
does not cause problem. However, METS & PREMIS schema will validate embedded xml if sufficient definition is available (processContents="lax"
).When we import this element into PREMIS document, it is not valid because
xsi:type
references a Type Definition (http://www.w3.org/TR/xmlschema-1/#xsi_type
), thus explicit assertion of type validation is attempted.The type
tcf:ntscFilmFramingType
cannot be resolved and causes validation to fail. Looking into aes.org, we cannot find a schema describing the element in the namespace: http://www.aes.org/tcf.It appears the AES X098B schema is not publicly available yet (according to Gary).