vogelwk / psi-pi

Automatically exported from code.google.com/p/psi-pi
0 stars 0 forks source link

Issues / schema updates prior to version 1 release #49

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Minor schema update: 

Removed mzIdentML-->DatabaseReference

This is no longer a valid element now that Database has been removed. 

(i.e. this gave the ability for a file to reference a corresponding
database entry, don't think this is required)

Original issue reported on code.google.com by andrewro...@googlemail.com on 7 May 2009 at 2:58

GoogleCodeExporter commented 8 years ago
Schema update:

Removed SequenceMass, sequenceLength and MassTable_ref from Peptide

Added MassTable_ref to SpectrumIdentificationItem

Added Keys and KeyRefs for this

Some minor changes to FuGElight that shouldn't affect anything in mzIdentML

Original comment by andrewro...@googlemail.com on 14 May 2009 at 10:05

GoogleCodeExporter commented 8 years ago
There are three elements I would like to remove from the schema: 
AnalysisResultList,
AnalysisResult, AnalysisResultItem. These were place holders so that any quant
extensions would extend from the same points in the hierarchy as
SpectrumIdentificationList, SpectrumIdentificationResult and
SpectrumIdentificationItem (same for proteins). Now that quant is going to be
developed independently, I can see no reason to keep them in?

Unless anyone can see some benefits, I'll remove these from the hierarchy.

Original comment by andrewro...@googlemail.com on 18 May 2009 at 9:58

GoogleCodeExporter commented 8 years ago
I think the passThreshold part of the schema is still a bit ambiguous (a comment
about FDR thresholds was also made by a reviewer).

I suggest we add an optional schema element to SpectrumIdentificationProtocol 
and
ProteinDetectionProtocol specifically for this purpose:

<SpectrumIdentificationProtocol id="SIP" 
AnalysisSoftware_ref="AS_mascot_server">
   <Threshold>
       <pf:cvParam accession="MS:1001316" name="mascot:SigThreshold" cvRef="PSI-MS"
value="0.05"/>

OR
        <pf:cvParam accession="MS:TODO" name="global FDR" cvRef="PSI-MS" value="0.05"/>

and the same for protein detection.

I realise there are cases where this cannot be given or it might be difficult to
report which peptides passed the threshold but several example files have
passThreshold set but do not specify what the threshold is.

Original comment by andrewro...@googlemail.com on 18 May 2009 at 10:29

GoogleCodeExporter commented 8 years ago
The Peptide element still has an optional attribute calculatedPI but we have no
examples of its use.

I think this should go - Peptides have no connection with a particular software
package so these values are fairly meaningless.

Does anyone remember a use case for this being present?

Original comment by andrewro...@googlemail.com on 18 May 2009 at 10:38

GoogleCodeExporter commented 8 years ago
Hold-over from pepXML. Also not meaningless if you use the information to 
weight paptide spectrum matches 
against retention time. No commment of removal, but I would be inclined to keep 
it, since it is part of pepXML

Original comment by delag...@gmail.com on 18 May 2009 at 11:09

GoogleCodeExporter commented 8 years ago
Some minor updates made that shouldn't affect any instance docs, the rest of the
issues above still to discuss:

Several attributes made mandatory:
SpectrumIdentificationList_ref on ProteinDetection->InputSpectrumIdentification 
and
on SpectrumIdentification (i.e. SpectrumIdentification must produce results and
ProteinDetection must reference an input)
ProteinDetectionList_ref on ProteinDetection i.e. a protein detection process 
must
produce a result 
Peptide ->peptideSequence made mandatory (previously agreed on a call but not 
done)

AnalysisSoftwre_ref on SpectrumIdentificationProtocol and 
ProteinDetectionProtocol
AnalysisProtocol and AnalysisProtocolApplication removed – these were 
unreachable and
not used at all in the schema and will not be used in any future extensions.

Original comment by andrewro...@googlemail.com on 18 May 2009 at 11:11

GoogleCodeExporter commented 8 years ago
In response to Comment 5.

If we want to keep it I would then prefer if this was moved to
SpectrumIdentificationItem - especially since we got rid of sequenceMass on 
Peptide
(and just kept calculatedMassToCharge on SII). The PI calculation is totally 
software
dependent.

Original comment by andrewro...@googlemail.com on 18 May 2009 at 11:15

GoogleCodeExporter commented 8 years ago
Schema updates made:

Added Threshold to SIP and PDP

Altered SubstitutionModification - removed residues. Added documentation about
specifying original peptide sequence for SubMods

Changed data type for peptideSequence - should now check for upper case 
sequence of chars

Original comment by andrewro...@googlemail.com on 20 May 2009 at 4:39

GoogleCodeExporter commented 8 years ago
Schema update:

Added SoftwareName --> Param to AnalysisSoftware

We need to add a mapping for these CV terms to the mapping file

Original comment by andrewro...@googlemail.com on 21 May 2009 at 8:10

GoogleCodeExporter commented 8 years ago
In response to Comment 3 and 8 (Threshold element):

I agree that it makes things clearer, but it seems to be mandatory at the 
moment, but
should be optional ("minOccurs=0") as Andy suggested in comment 3.

Original comment by eisena...@googlemail.com on 27 May 2009 at 7:45

GoogleCodeExporter commented 8 years ago
In the MPC example I set all passThreshold to true, although not having 
specified a
threshold; maybe this was a misunderstanding arising from the following schema
comment about passThreshold: "If no such threshold has been set, value of true 
should
be given for all results." (possibly a typo and should read "value of false"?

Original comment by eisena...@googlemail.com on 27 May 2009 at 7:52

GoogleCodeExporter commented 8 years ago
In response to comment 11

I think this is correct, if no threshold has been specified, all should be set 
to
true i.e. everything is deemed to have passed the threshold.

Let's discuss on the call the cardinality of <threshold> and passThreshold. 
There is
an argument for making both mandatory, but having a CV term for "no threshold" 
- this
might annoy some implementers though...

Original comment by andrewro...@googlemail.com on 27 May 2009 at 8:10

GoogleCodeExporter commented 8 years ago
comments 10,11,12: in TeleCon 28th May 2009 agreement to leave <Threshold> as
mandatory, to have a "no threshold" CV term and to set passThreshold to true, 
if all
reported peptides/proteins are accepted (because they've passed the "no 
threshold").

Original comment by eisena...@googlemail.com on 28 May 2009 at 3:29

GoogleCodeExporter commented 8 years ago
Comment from Patrick @ Matrix:

Why is the experimentalMassToCharge attribute of SpectrumIdentificationType 
optional?
 If it isn't set you could potentially have real trouble matching back to the source
spectrum (for example in a PMF search there could be no reference at all to the
entered mass, and if there isn't a handy reference to e.g. the dta file name you
could have the same problem with MS/MS datasets).  The example XTandem file 
doesn't
have the experimentalMassToCharge set, although all the others do.

Thanks,
Patrick

We should review the other cardinalities of SpectrumIdentificationItem as well

Original comment by andrewro...@googlemail.com on 25 Jun 2009 at 3:02

GoogleCodeExporter commented 8 years ago
schema updates:

Schema update:

- Added in Sample and Sample_ref
- Added corresponding keys and keyrefs
- experimentalMassToCharge use="required"
- chargeState use="required"

Some updates to the mapping file for taxonomy for Sample and DBFilter 
type,exclude
and include

Original comment by andrewro...@googlemail.com on 26 Jun 2009 at 10:32

GoogleCodeExporter commented 8 years ago
Updated schema:

Added documentation for neutral loss within Modification.

Changed data type for frame attribute to only allow -3 -2 -1 1 2 3 and for 
frames to
allow a list of these

Original comment by andrewro...@googlemail.com on 9 Jul 2009 at 3:58

GoogleCodeExporter commented 8 years ago
Made a change to the schema to add a version attribute to element mzIdentML. 
This is
fixed with a regex, currently accepting 0.9.X. On release, this will change to 
1.0.X

Original comment by andrewro...@googlemail.com on 16 Jul 2009 at 3:52

GoogleCodeExporter commented 8 years ago
Sorry, this is not good timing...
We currently have:

<SpectrumIdentificationResult ... >
  <SpectrumIdentificationItem  ... >
    <PeptideEvidence isDecoy="true" DBSequence_Ref="XYZ">

In the existing examples of using decoy, where each <SpectrumIdentificationItem>
always has a <PeptideEvidence> this works just fine.

The Mascot examples (which don't currently have a decoy search example) show the
possibility of saving all matches to spectra, enabling further statistical 
analysis
of the results. This is done by saving a <SpectrumIdentificationItem> without
<PeptideEvidence> for 'junk' matches. However, without the <PeptideEvidence> 
element,
there is no way to know if the match was to a decoy sequence. I think that the
isDecoy should be an attribute of <SpectrumIdentificationItem> or possibly 
<Peptide>.  
Without this change, I think this means we can't fulfil use case #4 without 
huge file
bloat. Too late to change???

Original comment by dcre...@gmail.com on 17 Jul 2009 at 11:14

GoogleCodeExporter commented 8 years ago
I think this relates to how Mascot views junk matches and protein hits rather 
than a
flaw in the schema. The PeptideEvidence elements only relate where the peptide
sequence came from in a peptide-spectrum match (it must have come from at least 
one
Protein sequence) - it says nothing about protein identity or otherwise. Mascot 
says
junk matches don't relate to a Protein - this is okay, you just don't output 
anything
for them in ProteinDetectionList. I think the PeptideEvidence link to DBSequence
should be provided for every single SII - and perhaps we should make the 
cardinality
1..many to enforce this?

To avoid too much file bloat, I would suggest not proving the <seq> attribute on
DBSequence for junk matches. Or am I missing something?

Original comment by andrewro...@googlemail.com on 17 Jul 2009 at 11:25

GoogleCodeExporter commented 8 years ago
Yes, I think you are right. I hadn't realised that it was possible to provide 
the
<DBSequence> without having a related <ProteinDetectionHypothesis>

However, we can't make PeptideEvidence 1..many because of denovo.
I'll update the Mascot examples at some point, but agree that we should leave 
the
schema the same.

Phew!

Original comment by dcre...@gmail.com on 17 Jul 2009 at 12:05

GoogleCodeExporter commented 8 years ago

Original comment by eisena...@googlemail.com on 20 Aug 2009 at 11:29