Open simsong opened 3 years ago
Apparently I need a DTD. Perhaps @ajnelson-nist can help.
The DFXML schema can be used to validate DFXML, though it needs to use the --schema
flag, not the --valid
flag. The Python code base's samples Makefile demonstrates this. I would recommend tracking the schema as a Git submodule, at the version where you want it to validate.
You may also be in for a bit of a data upgrade, as the DFXML schema identified many long-standing issues with the way DFXML was originally drafted. For one thing, namespaces are scoped to the element they're attached to, so your sample has no namespace to which it's claiming to conform. See Differencing test 0 for how to declare a <dfxml>
element as in the DFXML namespace.
Well, you are now the XML/DFXML expert. If you could give me a sample of how to add namespace other other scoping tags, I'll update bulk_extractor2.0 so that it produces conformant DFXML.
@ajnelson-nist - I think that I'm making progress on this. Now the validation errors apparently require that I do an update to the DFXML schema or create my own namespace.
Here is the new head of the DFXML output of bulk_extractor:
<?xml version='1.0' encoding='UTF-8'?>
<dfxml version='1.0' xmlns='http://www.forensicswiki.org/wiki/Category:Digital_Forensics_XML'
xmlns:debug='http://afflib.org/bulk_extractor/debug'
xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
xmlns:dc='http://purl.org/dc/elements/1.1/'>
<metadata>
<dc:type>Feature Extraction</dc:type>
</metadata>
<creator version='1.0'>
<program>BULK_EXTRACTOR</program>
<version>2.0.0-dev</version>
<build_environment>
...
And here is what happens when I try to validate it:
% xmllint --noout --schema dfxml.xsd out-domexusers-be20v3/report.xml (slg-dev)bulk_extractor
warning: failed to load external entity "ref/dc.xsd"
dfxml.xsd:34: element import: Schemas parser warning : Element '{http://www.w3.org/2001/XMLSchema}import': Failed to locate a schema at location 'ref/dc.xsd'. Skipping the import.
warning: failed to load external entity "ref/xml.xsd"
dfxml.xsd:43: element import: Schemas parser warning : Element '{http://www.w3.org/2001/XMLSchema}import': Failed to locate a schema at location 'ref/xml.xsd'. Skipping the import.
out-domexusers-be20v3/report.xml:14: element CPPFLAGS: Schemas validity error : Element '{http://www.forensicswiki.org/wiki/Category:Digital_Forensics_XML}CPPFLAGS': This element is not expected. Expected is one of ( {http://www.forensicswiki.org/wiki/Category:Digital_Forensics_XML}compilation_date, {http://www.forensicswiki.org/wiki/Category:Digital_Forensics_XML}library ).
out-domexusers-be20v3/report.xml:25: element cpuid: Schemas validity error : Element '{http://www.forensicswiki.org/wiki/Category:Digital_Forensics_XML}cpuid': This element is not expected.
out-domexusers-be20v3/report.xml:49: element configuration: Schemas validity error : Element '{http://www.forensicswiki.org/wiki/Category:Digital_Forensics_XML}configuration': This element is not expected. Expected is one of ( {http://www.forensicswiki.org/wiki/Category:Digital_Forensics_XML}source, ##other{http://www.forensicswiki.org/wiki/Category:Digital_Forensics_XML}*, {http://www.forensicswiki.org/wiki/Category:Digital_Forensics_XML}diskimageobject, {http://www.forensicswiki.org/wiki/Category:Digital_Forensics_XML}partitionsystemobject, {http://www.forensicswiki.org/wiki/Category:Digital_Forensics_XML}partitionobject, {http://www.forensicswiki.org/wiki/Category:Digital_Forensics_XML}volume, {http://www.forensicswiki.org/wiki/Category:Digital_Forensics_XML}fileobject, {http://www.forensicswiki.org/wiki/Category:Digital_Forensics_XML}rusage, ##other{http://www.forensicswiki.org/wiki/Category:Digital_Forensics_XML}* ).
out-domexusers-be20v3/report.xml fails to validate
%
I guess dc:
is Dublin Core, so I will need to get a Dublin Core xsd file somewhere.
I'm not sure what xsi:
is about. Any clue?