Closed cboettig closed 5 years ago
From discussions elsewhere, sounds like this checks out well enough so far, so I'm going to merge this into master. Validation could still be improved, in particular, in testing (with more invalid files) and error message quality, but will coordinate this with @mbjones once the updates to the official java validator are complete as well.
@mbjones Can you take a quick look through this when you get a chance.
I believe this should address #25 by implementing the additional validation checks, as I understood them (modulo any handling of
system
. Currently I just treat allids
throughout the document directly, while I think technically I should be permitting identical ids that have different systems, and likewise making sure thatreferences
describes
match the system and not just theid
value, right?)I believe I have understood the rule correctly about the use of
id
orreferences
regarding anannotation
(the annotation must have a subject, and only one can be the subject), but I'm not 100% sure. In particular, it looks like the current (but 5 mo stale) eml-data-paper.xml fails this test because there is a childannotation
on adataset
node that has noid
, and noreferences
on theannotation
. You'll see in this PR I've taken the liberty of modifying my local copy of that test file.Also, I wasn't clear if the packageId needed to be included in the list of ids that had to be unique (or similarly, if
references
anddescribes
were allowed to reference thepackageId
instead of anid
). Currently I have followed the instructions literally, sopackageId
has to exist but that is all, it's not part of the other checks.