vogelwk / psi-pi

Automatically exported from code.google.com/p/psi-pi
0 stars 0 forks source link

Bug in thresholds CV terms allowed in file #83

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
Email Thread copied from List:

From Eric:

There are potentially other thresholding mechanisms as well. One might 
threshold by probability, and I think I saw some FNR terms pop into the CV. I 
wonder if we might generalize them with the word “confidence”? Or maybe 
that has a specific statistical meaning that does not encompass these other 
concepts? Perhaps instead “statistical threshold”? That might logically 
comprise “FDR threshold”, “probability threshold’, “p-value 
threshold”, etc.

What do you think?

From: Jones, Andy [mailto:Andrew.Jones@liverpool.ac.uk] 
Sent: Thursday, May 22, 2014 5:43 AM
To: 'Gerhard Mayer'
Cc: 'psidev-pi-dev@lists.sourceforge.net'; 'Harald.Barsnes@biomed.uib.no'; 
'psidev-ms-vocab@lists.sourceforge.net'
Subject: Re: [Psidev-ms-vocab] [Psidev-pi-dev] Thresholds in mzid 1.1

Hi Gerhard,

In general this idea seems good, but I think some improvements might be 
possible. FDR estimation by target-decoy search is only one way of getting at 
FDR – some methods estimate this directly from the data itself without 
needing decoys. Having a general parent term of “FDR-based threshold” might 
be good enough – any other opinions?
Best wishes
Andy

From: Gerhard Mayer [mailto:mayerg97@rub.de] 
Sent: 22 May 2014 13:07
To: Jones, Andy
Cc: Harald.Barsnes@biomed.uib.no; psidev-pi-dev@lists.sourceforge.net; 
psidev-ms-vocab@lists.sourceforge.net
Subject: Re: [Psidev-pi-dev] Thresholds in mzid 1.1

Hi Andy and Harald,

maybe it would make sense to introduce a new term "target-decoy threshold",
further splitted into PSM-, peptide-, protein or protein group-level 
target-decoy threshold
under "quality estimation method details" and make all FDR resp. FNR terms 
childs of them
(see attached figure).

Then we can specify in the SpectrumIdentificationProtocolThreshold_rule
the term MS:1002484 (peptide-level target-decoy threshold) instead of 
MS:1001448 (pep:FDR threshold)
and in the ProteinDetectionProtocolThreshold_rule the term MS:1002485 (protein 
or protein group-level target-decoy threshold)
instead of MS:1001447 (prot:FDR threshold).

Any comments?

Cheers,
Gerhard
Am 22.05.2014 11:47, schrieb Jones, Andy:
Hi all,

Harald is working on mzid 1.1 export from PeptideShaker and has come up against 
a validation problem. The following terms are causing the validator to complain:

                <AnalysisProtocolCollection xmlns="http://psidev.info/psi/pi/mzIdentML/1.1">
                                <SpectrumIdentificationProtocol analysisSoftware_ref="ID_software" id="SearchProtocol_1">
                                                ...
                                                <Threshold>
                                                                <cvParam cvRef="PSI-MS" accession="MS:1001364" name="distinct peptide-level global FDR" value="0.01"/>
                                                </Threshold>
                                </SpectrumIdentificationProtocol>
                                <ProteinDetectionProtocol analysisSoftware_ref="ID_software" id="PeptideShaker_1">
                                                <Threshold>
                                                                <cvParam cvRef="PSI-MS" accession="MS:1002369" name="protein group-level global FDR" value="0.01"/>
                                                </Threshold>
                                </ProteinDetectionProtocol>
                </AnalysisProtocolCollection>

This is caused because they fall foul of the rules below i.e. they are not 
child terms of any of those specified. This seems like a mistake rather than 
something intended. Thresholding by peptide-level and protein group level will 
properly fixed in mzid 1.2, but I wonder if there is anything we can do to 
solve this problem in mzid 1.1 – Gerhard, any suggestions?
Cheers
Andy

**************

Message 1:
    Rule ID: SpectrumIdentificationProtocolThreshold_rule
    Level: ERROR
    Context(/spectrumIdentificationProtocol/threshold/cvParam/@accession )
    --> The result found at: /spectrumIdentificationProtocol/threshold/cvParam/@accession for which the values is  ''MS:1001364'' didn't match any of the 4 specified CV terms:
  - The sole term MS:1001448 (pep:FDR threshold) or any of its children. A single instance of this term can be specified. The matching value has to be the identifier of the term, not its name.
  - The sole term MS:1001494 (no threshold) or any of its children. A single instance of this term can be specified. The matching value has to be the identifier of the term, not its name.
  - Any children term of MS:1001153 (search engine specific score). The term can be repeated. The matching value has to be the identifier of the term, not its name.
  - Any children term of MS:1001302 (search engine specific input parameter). The term can be repeated. The matching value has to be the identifier of the term, not its name.

Message 2:
    Rule ID: ProteinDetectionProtocolThreshold_rule
    Level: ERROR
    Context(/proteinDetectionProtocol/threshold/cvParam/@accession )
    --> The result found at: /proteinDetectionProtocol/threshold/cvParam/@accession for which the values is  ''MS:1002369'' didn't match any of the 4 specified CV terms:
  - The sole term MS:1001447 (prot:FDR threshold) or any of its children. A single instance of this term can be specified. The matching value has to be the identifier of the term, not its name.
  - The sole term MS:1001494 (no threshold) or any of its children. A single instance of this term can be specified. The matching value has to be the identifier of the term, not its name.
  - Any children term of MS:1001153 (search engine specific score). The term can be repeated. The matching value has to be the identifier of the term, not its name.
  - Any children term of MS:1001302 (search engine specific input parameter). The term can be repeated. The matching value has to be the identifier of the term, not its name.

Original issue reported on code.google.com by andrewro...@googlemail.com on 12 Jun 2014 at 2:41