Closed GoogleCodeExporter closed 9 years ago
Schema update:
Removed SequenceMass, sequenceLength and MassTable_ref from Peptide
Added MassTable_ref to SpectrumIdentificationItem
Added Keys and KeyRefs for this
Some minor changes to FuGElight that shouldn't affect anything in mzIdentML
Original comment by andrewro...@googlemail.com
on 14 May 2009 at 10:05
There are three elements I would like to remove from the schema:
AnalysisResultList,
AnalysisResult, AnalysisResultItem. These were place holders so that any quant
extensions would extend from the same points in the hierarchy as
SpectrumIdentificationList, SpectrumIdentificationResult and
SpectrumIdentificationItem (same for proteins). Now that quant is going to be
developed independently, I can see no reason to keep them in?
Unless anyone can see some benefits, I'll remove these from the hierarchy.
Original comment by andrewro...@googlemail.com
on 18 May 2009 at 9:58
I think the passThreshold part of the schema is still a bit ambiguous (a comment
about FDR thresholds was also made by a reviewer).
I suggest we add an optional schema element to SpectrumIdentificationProtocol
and
ProteinDetectionProtocol specifically for this purpose:
<SpectrumIdentificationProtocol id="SIP"
AnalysisSoftware_ref="AS_mascot_server">
<Threshold>
<pf:cvParam accession="MS:1001316" name="mascot:SigThreshold" cvRef="PSI-MS"
value="0.05"/>
OR
<pf:cvParam accession="MS:TODO" name="global FDR" cvRef="PSI-MS" value="0.05"/>
and the same for protein detection.
I realise there are cases where this cannot be given or it might be difficult to
report which peptides passed the threshold but several example files have
passThreshold set but do not specify what the threshold is.
Original comment by andrewro...@googlemail.com
on 18 May 2009 at 10:29
The Peptide element still has an optional attribute calculatedPI but we have no
examples of its use.
I think this should go - Peptides have no connection with a particular software
package so these values are fairly meaningless.
Does anyone remember a use case for this being present?
Original comment by andrewro...@googlemail.com
on 18 May 2009 at 10:38
Hold-over from pepXML. Also not meaningless if you use the information to
weight paptide spectrum matches
against retention time. No commment of removal, but I would be inclined to keep
it, since it is part of pepXML
Original comment by delag...@gmail.com
on 18 May 2009 at 11:09
Some minor updates made that shouldn't affect any instance docs, the rest of the
issues above still to discuss:
Several attributes made mandatory:
SpectrumIdentificationList_ref on ProteinDetection->InputSpectrumIdentification
and
on SpectrumIdentification (i.e. SpectrumIdentification must produce results and
ProteinDetection must reference an input)
ProteinDetectionList_ref on ProteinDetection i.e. a protein detection process
must
produce a result
Peptide ->peptideSequence made mandatory (previously agreed on a call but not
done)
AnalysisSoftwre_ref on SpectrumIdentificationProtocol and
ProteinDetectionProtocol
AnalysisProtocol and AnalysisProtocolApplication removed – these were
unreachable and
not used at all in the schema and will not be used in any future extensions.
Original comment by andrewro...@googlemail.com
on 18 May 2009 at 11:11
In response to Comment 5.
If we want to keep it I would then prefer if this was moved to
SpectrumIdentificationItem - especially since we got rid of sequenceMass on
Peptide
(and just kept calculatedMassToCharge on SII). The PI calculation is totally
software
dependent.
Original comment by andrewro...@googlemail.com
on 18 May 2009 at 11:15
Schema updates made:
Added Threshold to SIP and PDP
Altered SubstitutionModification - removed residues. Added documentation about
specifying original peptide sequence for SubMods
Changed data type for peptideSequence - should now check for upper case
sequence of chars
Original comment by andrewro...@googlemail.com
on 20 May 2009 at 4:39
Schema update:
Added SoftwareName --> Param to AnalysisSoftware
We need to add a mapping for these CV terms to the mapping file
Original comment by andrewro...@googlemail.com
on 21 May 2009 at 8:10
In response to Comment 3 and 8 (Threshold element):
I agree that it makes things clearer, but it seems to be mandatory at the
moment, but
should be optional ("minOccurs=0") as Andy suggested in comment 3.
Original comment by eisena...@googlemail.com
on 27 May 2009 at 7:45
In the MPC example I set all passThreshold to true, although not having
specified a
threshold; maybe this was a misunderstanding arising from the following schema
comment about passThreshold: "If no such threshold has been set, value of true
should
be given for all results." (possibly a typo and should read "value of false"?
Original comment by eisena...@googlemail.com
on 27 May 2009 at 7:52
In response to comment 11
I think this is correct, if no threshold has been specified, all should be set
to
true i.e. everything is deemed to have passed the threshold.
Let's discuss on the call the cardinality of <threshold> and passThreshold.
There is
an argument for making both mandatory, but having a CV term for "no threshold"
- this
might annoy some implementers though...
Original comment by andrewro...@googlemail.com
on 27 May 2009 at 8:10
comments 10,11,12: in TeleCon 28th May 2009 agreement to leave <Threshold> as
mandatory, to have a "no threshold" CV term and to set passThreshold to true,
if all
reported peptides/proteins are accepted (because they've passed the "no
threshold").
Original comment by eisena...@googlemail.com
on 28 May 2009 at 3:29
Comment from Patrick @ Matrix:
Why is the experimentalMassToCharge attribute of SpectrumIdentificationType
optional?
If it isn't set you could potentially have real trouble matching back to the source
spectrum (for example in a PMF search there could be no reference at all to the
entered mass, and if there isn't a handy reference to e.g. the dta file name you
could have the same problem with MS/MS datasets). The example XTandem file
doesn't
have the experimentalMassToCharge set, although all the others do.
Thanks,
Patrick
We should review the other cardinalities of SpectrumIdentificationItem as well
Original comment by andrewro...@googlemail.com
on 25 Jun 2009 at 3:02
schema updates:
Schema update:
- Added in Sample and Sample_ref
- Added corresponding keys and keyrefs
- experimentalMassToCharge use="required"
- chargeState use="required"
Some updates to the mapping file for taxonomy for Sample and DBFilter
type,exclude
and include
Original comment by andrewro...@googlemail.com
on 26 Jun 2009 at 10:32
Updated schema:
Added documentation for neutral loss within Modification.
Changed data type for frame attribute to only allow -3 -2 -1 1 2 3 and for
frames to
allow a list of these
Original comment by andrewro...@googlemail.com
on 9 Jul 2009 at 3:58
Made a change to the schema to add a version attribute to element mzIdentML.
This is
fixed with a regex, currently accepting 0.9.X. On release, this will change to
1.0.X
Original comment by andrewro...@googlemail.com
on 16 Jul 2009 at 3:52
Sorry, this is not good timing...
We currently have:
<SpectrumIdentificationResult ... >
<SpectrumIdentificationItem ... >
<PeptideEvidence isDecoy="true" DBSequence_Ref="XYZ">
In the existing examples of using decoy, where each <SpectrumIdentificationItem>
always has a <PeptideEvidence> this works just fine.
The Mascot examples (which don't currently have a decoy search example) show the
possibility of saving all matches to spectra, enabling further statistical
analysis
of the results. This is done by saving a <SpectrumIdentificationItem> without
<PeptideEvidence> for 'junk' matches. However, without the <PeptideEvidence>
element,
there is no way to know if the match was to a decoy sequence. I think that the
isDecoy should be an attribute of <SpectrumIdentificationItem> or possibly
<Peptide>.
Without this change, I think this means we can't fulfil use case #4 without
huge file
bloat. Too late to change???
Original comment by dcre...@gmail.com
on 17 Jul 2009 at 11:14
I think this relates to how Mascot views junk matches and protein hits rather
than a
flaw in the schema. The PeptideEvidence elements only relate where the peptide
sequence came from in a peptide-spectrum match (it must have come from at least
one
Protein sequence) - it says nothing about protein identity or otherwise. Mascot
says
junk matches don't relate to a Protein - this is okay, you just don't output
anything
for them in ProteinDetectionList. I think the PeptideEvidence link to DBSequence
should be provided for every single SII - and perhaps we should make the
cardinality
1..many to enforce this?
To avoid too much file bloat, I would suggest not proving the <seq> attribute on
DBSequence for junk matches. Or am I missing something?
Original comment by andrewro...@googlemail.com
on 17 Jul 2009 at 11:25
Yes, I think you are right. I hadn't realised that it was possible to provide
the
<DBSequence> without having a related <ProteinDetectionHypothesis>
However, we can't make PeptideEvidence 1..many because of denovo.
I'll update the Mascot examples at some point, but agree that we should leave
the
schema the same.
Phew!
Original comment by dcre...@gmail.com
on 17 Jul 2009 at 12:05
Original comment by eisena...@googlemail.com
on 20 Aug 2009 at 11:29
Original issue reported on code.google.com by
andrewro...@googlemail.com
on 7 May 2009 at 2:58