Reducing attribute redundancy

vogelwk / psi-pi

Automatically exported from code.google.com/p/psi-pi

0 stars 0 forks source link

Reducing attribute redundancy #59

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago

When the general case for an mzIdentML element is to use the same value over 
and over for an attribute, e.g. searchDatabase_ref, spectraData_ref, 
massTable_ref, we should add a "default" version of these attributes on the 
parent element and make the individual attributes optional. When the individual 
attributes are present, they override the default value.

For example:
SII::massTable_ref overrides SIL::defaultMassTable_ref
SIR::spectraData_ref overrides SIL::defaultSpectraData_ref
DBSeq::searchDatabase_ref overrides SeqCol::defaultSearchDatabase_ref

Original issue reported on code.google.com by matt.cha...@gmail.com on 7 Apr 2011 at 3:57

GoogleCodeExporter commented 8 years ago

Heidelberg: Maybe easier for a semantic validator to have a mandatory attribute 
in place. default attributes may complicate things.

Original comment by eisena...@googlemail.com on 12 Apr 2011 at 10:06

Changed state: Accepted
Added labels: Milestone-Release1.1

GoogleCodeExporter commented 8 years ago

I'm not sure this rule would be too difficult to implement in a semantic 
validator, but it does make the schema a bit more complicated. I'd be 
interested in the API developers' viewpoints - how much difference could this 
make for saving memory on really big files? - Florian, any comment?

Original comment by andrewro...@googlemail.com on 12 Apr 2011 at 10:47

GoogleCodeExporter commented 8 years ago

Worst case scenario is a peptide database with millions of entries. A redundant 
attribute in that case would be significantly inefficient. Of course, the fact 
that DBSequence and Peptide are nearly duplicates in that case doesn't help 
either.

Original comment by matt.cha...@gmail.com on 12 Apr 2011 at 1:21

GoogleCodeExporter commented 8 years ago

As far as "difficulty" goes, the semantic validators already must deal with 
this for mzML, so there is no additional difficulty.

Original comment by matt.cha...@gmail.com on 12 Apr 2011 at 1:21

GoogleCodeExporter commented 8 years ago

Agreement in TeleCon 21.4.2011:
Having attributes repeated will reduce XML file space, but not object model 
space (in memory), the API programming gets a little bit more complicated (not 
too much).
XML file space from repeated attributes will probably not be as significant as 
other parts of the file (like sequences, scores) and should be efficiently 
zipped. So for the moment we leave it like it is.

No action point.

Original comment by eisena...@googlemail.com on 21 Apr 2011 at 4:22

Changed state: Fixed