Email Thread copied from List:
From Eric:
There are potentially other thresholding mechanisms as well. One might
threshold by probability, and I think I saw some FNR terms pop into the CV. I
wonder if we might generalize them with the word “confidence”? Or maybe
that has a specific statistical meaning that does not encompass these other
concepts? Perhaps instead “statistical threshold”? That might logically
comprise “FDR threshold”, “probability threshold’, “p-value
threshold”, etc.
What do you think?
From: Jones, Andy [mailto:Andrew.Jones@liverpool.ac.uk]
Sent: Thursday, May 22, 2014 5:43 AM
To: 'Gerhard Mayer'
Cc: 'psidev-pi-dev@lists.sourceforge.net'; 'Harald.Barsnes@biomed.uib.no';
'psidev-ms-vocab@lists.sourceforge.net'
Subject: Re: [Psidev-ms-vocab] [Psidev-pi-dev] Thresholds in mzid 1.1
Hi Gerhard,
In general this idea seems good, but I think some improvements might be
possible. FDR estimation by target-decoy search is only one way of getting at
FDR – some methods estimate this directly from the data itself without
needing decoys. Having a general parent term of “FDR-based threshold” might
be good enough – any other opinions?
Best wishes
Andy
From: Gerhard Mayer [mailto:mayerg97@rub.de]
Sent: 22 May 2014 13:07
To: Jones, Andy
Cc: Harald.Barsnes@biomed.uib.no; psidev-pi-dev@lists.sourceforge.net;
psidev-ms-vocab@lists.sourceforge.net
Subject: Re: [Psidev-pi-dev] Thresholds in mzid 1.1
Hi Andy and Harald,
maybe it would make sense to introduce a new term "target-decoy threshold",
further splitted into PSM-, peptide-, protein or protein group-level
target-decoy threshold
under "quality estimation method details" and make all FDR resp. FNR terms
childs of them
(see attached figure).
Then we can specify in the SpectrumIdentificationProtocolThreshold_rule
the term MS:1002484 (peptide-level target-decoy threshold) instead of
MS:1001448 (pep:FDR threshold)
and in the ProteinDetectionProtocolThreshold_rule the term MS:1002485 (protein
or protein group-level target-decoy threshold)
instead of MS:1001447 (prot:FDR threshold).
Any comments?
Cheers,
Gerhard
Am 22.05.2014 11:47, schrieb Jones, Andy:
Hi all,
Harald is working on mzid 1.1 export from PeptideShaker and has come up against
a validation problem. The following terms are causing the validator to complain:
<AnalysisProtocolCollection xmlns="http://psidev.info/psi/pi/mzIdentML/1.1">
<SpectrumIdentificationProtocol analysisSoftware_ref="ID_software" id="SearchProtocol_1">
...
<Threshold>
<cvParam cvRef="PSI-MS" accession="MS:1001364" name="distinct peptide-level global FDR" value="0.01"/>
</Threshold>
</SpectrumIdentificationProtocol>
<ProteinDetectionProtocol analysisSoftware_ref="ID_software" id="PeptideShaker_1">
<Threshold>
<cvParam cvRef="PSI-MS" accession="MS:1002369" name="protein group-level global FDR" value="0.01"/>
</Threshold>
</ProteinDetectionProtocol>
</AnalysisProtocolCollection>
This is caused because they fall foul of the rules below i.e. they are not
child terms of any of those specified. This seems like a mistake rather than
something intended. Thresholding by peptide-level and protein group level will
properly fixed in mzid 1.2, but I wonder if there is anything we can do to
solve this problem in mzid 1.1 – Gerhard, any suggestions?
Cheers
Andy
**************
Message 1:
Rule ID: SpectrumIdentificationProtocolThreshold_rule
Level: ERROR
Context(/spectrumIdentificationProtocol/threshold/cvParam/@accession )
--> The result found at: /spectrumIdentificationProtocol/threshold/cvParam/@accession for which the values is ''MS:1001364'' didn't match any of the 4 specified CV terms:
- The sole term MS:1001448 (pep:FDR threshold) or any of its children. A single instance of this term can be specified. The matching value has to be the identifier of the term, not its name.
- The sole term MS:1001494 (no threshold) or any of its children. A single instance of this term can be specified. The matching value has to be the identifier of the term, not its name.
- Any children term of MS:1001153 (search engine specific score). The term can be repeated. The matching value has to be the identifier of the term, not its name.
- Any children term of MS:1001302 (search engine specific input parameter). The term can be repeated. The matching value has to be the identifier of the term, not its name.
Message 2:
Rule ID: ProteinDetectionProtocolThreshold_rule
Level: ERROR
Context(/proteinDetectionProtocol/threshold/cvParam/@accession )
--> The result found at: /proteinDetectionProtocol/threshold/cvParam/@accession for which the values is ''MS:1002369'' didn't match any of the 4 specified CV terms:
- The sole term MS:1001447 (prot:FDR threshold) or any of its children. A single instance of this term can be specified. The matching value has to be the identifier of the term, not its name.
- The sole term MS:1001494 (no threshold) or any of its children. A single instance of this term can be specified. The matching value has to be the identifier of the term, not its name.
- Any children term of MS:1001153 (search engine specific score). The term can be repeated. The matching value has to be the identifier of the term, not its name.
- Any children term of MS:1001302 (search engine specific input parameter). The term can be repeated. The matching value has to be the identifier of the term, not its name.
Original issue reported on code.google.com by andrewro...@googlemail.com on 12 Jun 2014 at 2:41
Original issue reported on code.google.com by
andrewro...@googlemail.com
on 12 Jun 2014 at 2:41