vogelwk / psi-pi

Automatically exported from code.google.com/p/psi-pi
0 stars 0 forks source link

Schema/validation issues #73

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
While looking at the recent CV release candidate I found problems with the 
existing mzIdentML examples and with the validation and/or schema.

Here are the SearchDatabase values from the MPC example:
<SearchDatabase ...snip...
  <DatabaseName>
    <cvParam accession="MS:1001142" name="database IPI_human" cvRef="PSI-MS"/>
  </DatabaseName>
  <cvParam accession="MS:1001300" name="decoy DB from IPI_human" cvRef="PSI-MS"/>
  <cvParam accession="MS:1001197" name="DB composition target+decoy" cvRef="PSI-MS"/>
  <cvParam accession="MS:1001452" name="decoy DB type shuffle" cvRef="PSI-MS"/>
  <cvParam accession="MS:1001451" name="decoy DB generation algorithm" cvRef="PSI-MS" value="PeakQuant.DecoyDatabaseBuilder"/>
  <cvParam accession="MS:1001283" name="decoy DB accession regexp" cvRef="PSI-MS" value="^SHD"/>
</SearchDatabase>

Here are the values from the Mascot example:
<SearchDatabase ...snip...>
  <FileFormat>
    <cvParam accession="MS:1001348" name="FASTA format" cvRef="PSI-MS"/>
  </FileFormat>
  <DatabaseName>
    <userParam name="SwissProt_51.6.fasta"/>
  </DatabaseName>
  <cvParam accession="MS:1001073" name="database type amino acid" cvRef="PSI-MS"/>
</SearchDatabase>

I'm pretty sure we want to require "database type" all the time, but at least 
in the MIAPE case. But the validator doesn't squawk at the MPC example.

Likewise I'm pretty sure we want to require the FileFormat element for 
SearchDatabase, SourceFile, and SpectraData, but this requirement falls through 
the cracks between where the schema validation stops and the semantic 
validation begins. It's allowed to be missing by the schema because it's 
minOccurs=0 in the ExternalDataType base type of these 3 elements. And it's 
ignored entirely by the validator when the element missing.

Two solutions:
1. Fix it schematically. These 3 elements are the only time we're using 
ExternalDataType. When ExternalDataType was in FuGE, FileFormat couldn't be 
schematically required in case other subclasses of ExternalDataType did not 
require it. Now that we have a copy of it, we can get rid of minOccurs=0.

2. Fix it by adding an attribute to the mapping file that specifies that a rule 
should NOT be ignored if the Xpath is missing and the rule contains a MUST term.

I lean toward option 1 since we have a few other outstanding schema issues to 
put in a 1.2 revision. But I can understand not wanting to do that concurrently 
with work on mzQuantML.

Original issue reported on code.google.com by matt.cha...@gmail.com on 15 Nov 2012 at 5:31

GoogleCodeExporter commented 8 years ago
Discussed at PSI2013 - decision from the group was to live with these 
inconsistencies for now. No-one votes for a new schema at the present time 
(sorry Matt!). Stability of mzid is key to getting good uptake.

When we do the next update to the spec doc, we should encourage good practice.

ACTION: Close the issue once update to spec doc has been done.

Original comment by andrewro...@googlemail.com on 17 Apr 2013 at 1:16