Number of peptides / proteins to be reported

GoogleCodeExporter commented 8 years ago

Writing documentation always brings a few problems to the surface...

Consider 2 separate use cases:
 - Import into a repository such as Pride
 - Import into an application such as Scaffold or Peptide Prophet

With the first, you probably only want to report and store proteins above a
certain threshold that for example gives an FDR described in the instance
document. 
With the second, we need to give scores for all spectra. (See:
http://code.google.com/p/psi-pi/wiki/NotesForDocumentation#How_many_should__be_s
aved
for details). 

One option is to insist that these two cases require two different instance
documents. 

Another option is to 'encourage' people to save a larger number of results
and add (say) a boolean attribute to <SpectrumIdentificationItem> and
<ProteinDetectionHypothesis> that indicates that the result is above the
specified thresholds. 
Without the boolean flag, if all results are saved (as in the current
Mascot examples), a repository such as Pride would need to understand the
relevant CV and only store results above a given score.

Any thoughts / preferences?

Original issue reported on code.google.com by dcre...@gmail.com on 8 Dec 2008 at 12:31

GoogleCodeExporter commented 8 years ago

I think we can support this for SpectrumIdentifications with no schema change. 
We 
can encourage data producers to produce large number of spectrum 
identifications. 
The identifications deemed "correct" are the ones referenced from 
ProteinDetectionHypotheses. This then also supports the use case of peptide 
identification scores being boosted once a protein detection has been made. 
PRIDE 
can make a call on how many of the SpectrumIdentifications to load.

This does not solve the FDR problem for proteins though. However, I think the 
solution of creating two different instance documents would be okay. If sending 
to 
PRIDE, just send the list of proteins found. If using analysisXML as an 
intermediate 
format, send a long list of proteins (and peptides) with no set criteria for 
what 
is "correct".

Original comment by andrewro...@googlemail.com on 8 Dec 2008 at 3:58

GoogleCodeExporter commented 8 years ago

Added PassThreshold attribute to SpectrumIdentificationItem and
ProteinDetectionHypothesis

Original comment by dcre...@gmail.com on 29 Apr 2009 at 6:19

Changed state: Fixed

GoogleCodeExporter commented 8 years ago

I've re-opened this issue rather than add the problem to the end of the CV issue
because it seems more relevant here. We (or maybe just me) may have lost the 
plot here...

Comment 2 says what we added to the schema. The structure for specifying which 
items
"PassThreshold" is described here:
http://code.google.com/p/psi-pi/issues/detail?id=49#c3

Currently, the only CV items allowed in the 'Threshold' are:
MS:1001448: pep:FDR threshold
MS:1001447: prot:FDR threshold
MS:1001494: no threshold

However, using an FDR is not the only way to do this. For example, if you've 
only got
a few spectra, then FDR is definitely is not an option. I think we need other 
terms
to be allowed here. For example, you might want to specify that the threshold 
is 
"ZYXScore > 23.4"

In the Mascot example, there is for example:
<ProteinDetectionProtocol id="PDP_MascotParser...
  <AnalysisParams>
    <cvParam accession="MS:1001316" name="mascot:SigThreshold" cvRef="PSI-MS"
value="0.05"/>

And I guess in this case I would like to specify MS:1001316 in
AnalysisProtocolCollection/SpectrumIdentificationProtocol/Threshold/

Original comment by dcre...@gmail.com on 23 Jun 2009 at 5:05

Changed state: Accepted

GoogleCodeExporter commented 8 years ago

Agreed, we need a few different terms here, not sure if they exist in the CV or 
just
not in the mapping at present e.g. p-value, mascot:SigThreshold, some terms for
Sequest etc.

Original comment by andrewro...@googlemail.com on 24 Jun 2009 at 1:11

GoogleCodeExporter commented 8 years ago

for Phenyx we probably already have something:

a) representation of peptide scores:

id: MS:1001395
name: Phenyx:Pepzscore
and
id: MS:1001396
name: Phenyx:PepPvalue
both are
is_a: MS:1001143 ! search engine specific score for peptides
is_a: MS:1001153 ! search engine specific score

id: MS:1001384
name: Phenyx:MinPepzscore
and
id: MS:1001385
name: Phenyx:MaxPepPvalue
both is_a: MS:1001302 ! search engine specific input parameter

2) and we also have 2 binary values for a "valid" or "accepted" peptide status,
corresponding to automatic selection (unedited search result) and user-defined
selection (that has gone through manual selection), respectively:
id: MS:1001393
name: Phenyx:Auto
and 
id: MS:1001394
name: Phenyx:User
both defined 
is_a: MS:1001143 ! search engine specific score for peptides
is_a: MS:1001153 ! search engine specific score

As the thresholding we talk about about is a "post processing" event (something 
we
apply to the "raw" result), which one would you consider? I'm just wondering 
which
one could fit best for a passThreshold criterium.

Original comment by pierreal...@gmail.com on 25 Jun 2009 at 7:34

GoogleCodeExporter commented 8 years ago

Original comment by eisena...@googlemail.com on 20 Aug 2009 at 11:28

Changed state: Fixed
Added labels: Milestone-Release1.0

vogelwk / psi-pi

Number of peptides / proteins to be reported #45