mwalzer / psi-pi

Automatically exported from code.google.com/p/psi-pi
0 stars 0 forks source link

Issues with all instance docs #44

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Drop any issues with instance docs in this issue

Original issue reported on code.google.com by andrewro...@googlemail.com on 16 Oct 2008 at 3:57

GoogleCodeExporter commented 9 years ago
I don't think we have examples of immonium or other internal ions using the
fragmentation ion structure, where the indexes are pairs representing start and 
end
points of ions rather than integer indexes.

Original comment by andrewro...@googlemail.com on 20 Nov 2008 at 12:21

GoogleCodeExporter commented 9 years ago
(I've put this response to:
http://code.google.com/p/psi-pi/issues/detail?id=42#c44
into issue 44 because it's more of an instance doc issue.)

Looking at the instance doc:
http://code.google.com/p/psi-pi/source/browse/trunk/examples/MPC_example.axml
It's not really clear what :
<SourceFile id="SF1" location="proteinscape://....

is pointing to because there is no <pf:fileFormat>. I presume that this is some 
sort
of relational database? Maybe you do want to add a description for the format 
under 
  PI:00040 ! source file format

I guess that we used the term SourceFile because in most cases the "thing" that 
is
used to create the analysisXML instance document will be another file. Maybe
SourceData would be a more generic name.

<SpectraData id="LCMALDI_spectra"
location="proteinscape://www.medizinisches-proteom-center.de/PSServer/Project/Sa
mple/Separation_1D_LC/Fraction_X"/>

What does this point to? From the example under svn, looks as though this in an 
mgf
file (even though that may be stored in a RDB).

Original comment by dcre...@gmail.com on 28 Nov 2008 at 3:01

GoogleCodeExporter commented 9 years ago
Move rank from being a CV item to an attribute

Remove:
id: PI:00327
name: peptide rank
id: PI:00216
name: sequest:PeptideRank

Original comment by dcre...@gmail.com on 1 Dec 2008 at 5:57

GoogleCodeExporter commented 9 years ago
Need to show how chemical and post translational modifications are handled with 
the
15N example. (DC)

Original comment by dcre...@gmail.com on 5 May 2009 at 11:01

GoogleCodeExporter commented 9 years ago
Report the type of the decoy: reverse, shuffled, concatenated …
Add this to the instance examples.

Original comment by dcre...@gmail.com on 5 May 2009 at 11:05

GoogleCodeExporter commented 9 years ago
No examples of how mzIdentML is tied with mzML. DC to provide one.

Original comment by dcre...@gmail.com on 5 May 2009 at 11:08

GoogleCodeExporter commented 9 years ago
Add an example of <pf:components> is used

Original comment by dcre...@gmail.com on 5 May 2009 at 11:10

GoogleCodeExporter commented 9 years ago
Add an example of Element<Modification>, Substitution modification.

Original comment by dcre...@gmail.com on 5 May 2009 at 11:11

GoogleCodeExporter commented 9 years ago
Element <DatabaseFilters> Include an example of an exclude value.

Original comment by dcre...@gmail.com on 5 May 2009 at 11:12

GoogleCodeExporter commented 9 years ago
Regarding Comment #9:
The Mascot_N15_example.mzid search was against all "Arabidopsis" excluding
"Arabidopsis neglecta":

<DatabaseFilters>
  <Filter>
    <FilterType>
      <pf:cvParam accession="MS:1001020" name="DB filter taxonomy" cvRef="PSI-MS" />
    </FilterType>
    <Include>
      <pf:cvParam accession="NCBI:3701" name="Arabidopsis" cvRef="NCBI-TAXONOMY" />
    </Include>
    <Exclude>
      <pf:cvParam accession="NCBI:45251" name="Arabidopsis neglecta"
cvRef="NCBI-TAXONOMY" />
    </Exclude>
  </Filter>
</DatabaseFilters>

Original comment by dcre...@gmail.com on 20 May 2009 at 11:02

GoogleCodeExporter commented 9 years ago
Regarding comment #6.
An example is now provided here: 
http://code.google.com/p/psi-pi/source/browse/trunk/examples/Mascot_mzml_example
.mzid
This is a search of small.pwiz.1.1.mzML which is referenced from here:
http://psidev.info/index.php?q=node/257

The example shows how the results can be tracked back to the raw data using the 
nativeID.

Original comment by dcre...@gmail.com on 20 May 2009 at 1:34

GoogleCodeExporter commented 9 years ago
Regarding comment#8. Example now included in Mascot N15 example
Doesn't use the residues attribute.

Original comment by dcre...@gmail.com on 20 May 2009 at 3:00

GoogleCodeExporter commented 9 years ago
Re-opening this issue:

We don't currently have an example for MS:1001089 and MS:1001090 and they
are not well defined. 

[Term]
id: MS:1001089
name: protein taxonomy
def: "The taxonomy of the resultant protein from the search." [PSI:PI]
xref: value-type:xsd\:string "The allowed value-type for this CV term."
is_a: MS:1001085 ! protein result details

[Term]
id: MS:1001090
name: taxonomy nomenclature
def: "The system used to indicate taxonomy. There should be an enumerated
list of options: latin name, NCBI TaxID, common name, Swiss-Prot species ID
(ex. RABIT from the full protein ID ALBU_RABIT)." [PSI:PI]
is_a: MS:1001089 ! protein taxonomy

This seems a little unwieldy to me. How about just having different CV for
each type of ID. For example

<DBSequence id="x" length="449" SearchDatabase_ref="y" accession="z" >
  <seq>MGKEKFHINIVVIGHVDSGKSTTTGHLIY...</seq>
  <pf:cvParam accession="MS:1001088" name="protein description"
cvRef="PSI-MS" value="Elongation factor..." />
  <pf:cvParam accession="MS:xxxxxx1" name="taxonomy: NCBI TaxID"     
cvRef="PSI-MS" value="9606" />
  <pf:cvParam accession="MS:xxxxxx2" name="taxonomy: common name"    
cvRef="PSI-MS" value="human" />
  <pf:cvParam accession="MS:xxxxxx3" name="taxonomy: scientific name"
cvRef="PSI-MS" value="Homo sapiens" />
  <pf:cvParam accession="MS:xxxxxx4" name="taxonomy: Swiss-Prot ID"  
cvRef="PSI-MS" value="HUMAN" />
</DBSequence>

Original comment by andrewro...@googlemail.com on 10 Jun 2009 at 3:23

GoogleCodeExporter commented 9 years ago
Regarding comment #13 - the Mascot examples now include taxonomy information for
protein sequences

Original comment by dcre...@gmail.com on 12 Jun 2009 at 5:04

GoogleCodeExporter commented 9 years ago
Regarding comment #7: Add an example of how <components> are used. Fixed in
Mascot_N15_example.mzid

Original comment by dcre...@gmail.com on 23 Jun 2009 at 10:16

GoogleCodeExporter commented 9 years ago
None of the examples seem to include missedCleavages in <PeptideEvidence>. 
Added to
Mascot examples.

Original comment by dcre...@gmail.com on 23 Jun 2009 at 10:22

GoogleCodeExporter commented 9 years ago
None of the examples include a BibliographicReference - added to the 
PMF_example.mzid

Original comment by dcre...@gmail.com on 24 Jun 2009 at 5:12

GoogleCodeExporter commented 9 years ago
Just to check for correct use:

The MPC example uses a decoy DB derived from a IPI_human DB.
As DatabaseName the cvParam "IPI_human" (without "_decoy" as part of the name)
and as further CVParams the decoy type and generation details are described.

Original comment by eisena...@googlemail.com on 26 Jun 2009 at 1:34

GoogleCodeExporter commented 9 years ago
Could we have an example with a modification with a neutral loss defined?

Original comment by patri...@matrixscience.com on 30 Jun 2009 at 2:47

GoogleCodeExporter commented 9 years ago
The Mascot_MSMS_example.mzid file now has an example of manual validation:

          <ProteinDetectionHypothesis id="PDH_HSP71_RAT"
DBSequence_ref="DBSeq_HSP71_RAT"  passThreshold="false">
            <PeptideHypothesis  PeptideEvidence_Ref="PE_2_1_HSP71_RAT" />
            <cvParam accession="MS:1001171" name="mascot:score" cvRef="PSI-MS"
value="40.95" />
            <cvParam accession="MS:1001093" name="sequence coverage" cvRef="PSI-MS"
value="2" />
            <cvParam accession="MS:1001097" name="distinct peptide sequences"
cvRef="PSI-MS" value="1" />
            <cvParam accession="MS:1001125" name="manual validation" cvRef="PSI-MS"
value="Manually rejected this protein for no particular reason. (Example of 
manual
validation for mzIdentML)" />
          </ProteinDetectionHypothesis>

Original comment by dcre...@gmail.com on 8 Jul 2009 at 4:56

GoogleCodeExporter commented 9 years ago
Regarding comment 19: Could we have an example with a modification with a 
neutral
loss defined?

The Mascot 15N example specifies Oxidation, and this can have a neutral loss of:
<umod:NeutralLoss avge_mass="64.1069" composition="H(4) C O S" flag="false"
mono_mass="63.998285">

Unfortunately, this seems to get lost in the conversion of the unimod.xml to
unimod.obo file, so this needs to be fixed. (Please, Martin!)

We should probably also specify, for each peptide match, which neutral loss, if 
any,
was 'dominant' and/or used for scoring. 
Two pieces of information are required: The position and the loss, and there 
can of
course be multiple neutral losses.
For an example sequence of ALNVMESK, we'd want to specify that it is residue 5, 
and
the loss is 63.998285

There are 3 possible places for CV terms for this:
1. 
    <Peptide id="peptide_21_1">
      <peptideSequence>ALNVMESK</peptideSequence>
      <Modification location="5" residues="M" monoisotopicMassDelta="15.994919">
        <cvParam accession="UNIMOD:35" name="Oxidation" cvRef="UNIMOD" />
--->    <cpParam accession="MS:1001xxx" name="neutral loss" cvRef="PSI-MS"
value="63.998285" />
      </Modification>
    </Peptide>

2. Or, it could be with other cv terms directly under:
    <SpectrumIdentificationItem id="SII_21_1"
But to cope with multiple pairs of values, this would require a structure 
similar to
the one under <Peptide> and therefore a change to the schema

3. In the fragment arrays (remember it's the 5th residue that is methionine):

    <SpectrumIdentificationItem id="SII_21_1" 
      <Fragmentation>
        <IonType index="1 2 3 4 5" charge="1">
          <cvParam cvRef="PSI-MS" accession="MS:1001224" name="frag: b ion"/>
          <FragmentArray values="72.09201 185.11602 299.24156 398.31937 481.32147 "
Measure_ref="m_mz"/>
          <FragmentArray values="1.217 0.5765 0.9995 0.3048 0.6491"
Measure_ref="m_intensity"/>
          <FragmentArray values="0.0476 -0.0124 0.0702 0.0796 0.0446"
Measure_ref="m_error"/>
--->      <FragmentArray values="0, 0, 0, 0, 63.998285" 
Measure_ref="m_neutral_loss"/>
        </IonType>

And we'd also need to define m_neutral_loss:

    <AnalysisData>
      <SpectrumIdentificationList id="SIL_1" numSequencesSearched="4944">
        <FragmentationTable>
          <Measure id="m_m_neutral_loss">
--->        <cvParam cvRef="PSI-MS" accession="MS:100xxxx" name="product ion 
neutral
loss"/>
          </Measure>

I propose #3 - anyone disagree? Perhaps we can discuss at the telecon.

Original comment by dcre...@gmail.com on 9 Jul 2009 at 8:23

GoogleCodeExporter commented 9 years ago
My preference would be to treat a neutral loss as any other modification and go 
with
option 1. Is there some information that would be lost by encoding it in this 
way?

Original comment by andrewro...@googlemail.com on 9 Jul 2009 at 8:50

GoogleCodeExporter commented 9 years ago
No, I don't think anything is lost. It's more that the <Peptide> is just about 
the
peptide sequence and any modified residues rather than about how it fragmented.
However, it's obviously simpler to use option #1. Both options just require one 
new
CV term. 

Original comment by dcre...@gmail.com on 9 Jul 2009 at 11:31

GoogleCodeExporter commented 9 years ago
Need to change:

<cv id="UNIMOD" URI="http://www.unimod.org/xml/unimod.xml">
to
<cv id="UNIMOD" URI="http://www.unimod.org/obo/unimod.obo">

(Done for all Mascot examples)

Original comment by dcre...@gmail.com on 9 Jul 2009 at 11:33

GoogleCodeExporter commented 9 years ago
Regarding comment 21. We agreed at telecon to keep it simple and go for approach
number 1, with the option of *additionally* using approach #3 for more complex 
cases.
Changes made to Mascot examples which use MS:1000336" name="neutral loss"

Original comment by dcre...@gmail.com on 9 Jul 2009 at 6:10

GoogleCodeExporter commented 9 years ago
Some minor fixes needed to instance docs prior to submission: 

MPCExample (remove this line)
    <!-- CAUTION: ALL experimentalMassToCharge, peptide scores, protein scores and
sequence coverage values are only placeholders for the real values, because 
file is
handcrafted and shows only principle structure of AnalysisXML! -->

Mascot MS/MS example:  
<userParam name="Mascot User Comment" value="Example Mascot MS-MS search for PSI
AnalysisXML"/>

If Martin is away, I'll manually fix the MPC example for now

Original comment by andrewro...@googlemail.com on 31 Jul 2009 at 9:28

GoogleCodeExporter commented 9 years ago
Regarding comment 26: Mascot example changed.

Original comment by dcre...@gmail.com on 31 Jul 2009 at 2:15

GoogleCodeExporter commented 9 years ago

Original comment by eisena...@googlemail.com on 10 Jun 2010 at 8:55