Closed GoogleCodeExporter closed 9 years ago
I think we can support this for SpectrumIdentifications with no schema change.
We
can encourage data producers to produce large number of spectrum
identifications.
The identifications deemed "correct" are the ones referenced from
ProteinDetectionHypotheses. This then also supports the use case of peptide
identification scores being boosted once a protein detection has been made.
PRIDE
can make a call on how many of the SpectrumIdentifications to load.
This does not solve the FDR problem for proteins though. However, I think the
solution of creating two different instance documents would be okay. If sending
to
PRIDE, just send the list of proteins found. If using analysisXML as an
intermediate
format, send a long list of proteins (and peptides) with no set criteria for
what
is "correct".
Original comment by andrewro...@googlemail.com
on 8 Dec 2008 at 3:58
Added PassThreshold attribute to SpectrumIdentificationItem and
ProteinDetectionHypothesis
Original comment by dcre...@gmail.com
on 29 Apr 2009 at 6:19
I've re-opened this issue rather than add the problem to the end of the CV issue
because it seems more relevant here. We (or maybe just me) may have lost the
plot here...
Comment 2 says what we added to the schema. The structure for specifying which
items
"PassThreshold" is described here:
http://code.google.com/p/psi-pi/issues/detail?id=49#c3
Currently, the only CV items allowed in the 'Threshold' are:
MS:1001448: pep:FDR threshold
MS:1001447: prot:FDR threshold
MS:1001494: no threshold
However, using an FDR is not the only way to do this. For example, if you've
only got
a few spectra, then FDR is definitely is not an option. I think we need other
terms
to be allowed here. For example, you might want to specify that the threshold
is
"ZYXScore > 23.4"
In the Mascot example, there is for example:
<ProteinDetectionProtocol id="PDP_MascotParser...
<AnalysisParams>
<cvParam accession="MS:1001316" name="mascot:SigThreshold" cvRef="PSI-MS"
value="0.05"/>
And I guess in this case I would like to specify MS:1001316 in
AnalysisProtocolCollection/SpectrumIdentificationProtocol/Threshold/
Original comment by dcre...@gmail.com
on 23 Jun 2009 at 5:05
Agreed, we need a few different terms here, not sure if they exist in the CV or
just
not in the mapping at present e.g. p-value, mascot:SigThreshold, some terms for
Sequest etc.
Original comment by andrewro...@googlemail.com
on 24 Jun 2009 at 1:11
for Phenyx we probably already have something:
a) representation of peptide scores:
id: MS:1001395
name: Phenyx:Pepzscore
and
id: MS:1001396
name: Phenyx:PepPvalue
both are
is_a: MS:1001143 ! search engine specific score for peptides
is_a: MS:1001153 ! search engine specific score
id: MS:1001384
name: Phenyx:MinPepzscore
and
id: MS:1001385
name: Phenyx:MaxPepPvalue
both is_a: MS:1001302 ! search engine specific input parameter
2) and we also have 2 binary values for a "valid" or "accepted" peptide status,
corresponding to automatic selection (unedited search result) and user-defined
selection (that has gone through manual selection), respectively:
id: MS:1001393
name: Phenyx:Auto
and
id: MS:1001394
name: Phenyx:User
both defined
is_a: MS:1001143 ! search engine specific score for peptides
is_a: MS:1001153 ! search engine specific score
As the thresholding we talk about about is a "post processing" event (something
we
apply to the "raw" result), which one would you consider? I'm just wondering
which
one could fit best for a passThreshold criterium.
Original comment by pierreal...@gmail.com
on 25 Jun 2009 at 7:34
Original comment by eisena...@googlemail.com
on 20 Aug 2009 at 11:28
Original issue reported on code.google.com by
dcre...@gmail.com
on 8 Dec 2008 at 12:31