Closed GoogleCodeExporter closed 8 years ago
I don't think we have examples of immonium or other internal ions using the
fragmentation ion structure, where the indexes are pairs representing start and
end
points of ions rather than integer indexes.
Original comment by andrewro...@googlemail.com
on 20 Nov 2008 at 12:21
(I've put this response to:
http://code.google.com/p/psi-pi/issues/detail?id=42#c44
into issue 44 because it's more of an instance doc issue.)
Looking at the instance doc:
http://code.google.com/p/psi-pi/source/browse/trunk/examples/MPC_example.axml
It's not really clear what :
<SourceFile id="SF1" location="proteinscape://....
is pointing to because there is no <pf:fileFormat>. I presume that this is some
sort
of relational database? Maybe you do want to add a description for the format
under
PI:00040 ! source file format
I guess that we used the term SourceFile because in most cases the "thing" that
is
used to create the analysisXML instance document will be another file. Maybe
SourceData would be a more generic name.
<SpectraData id="LCMALDI_spectra"
location="proteinscape://www.medizinisches-proteom-center.de/PSServer/Project/Sa
mple/Separation_1D_LC/Fraction_X"/>
What does this point to? From the example under svn, looks as though this in an
mgf
file (even though that may be stored in a RDB).
Original comment by dcre...@gmail.com
on 28 Nov 2008 at 3:01
Move rank from being a CV item to an attribute
Remove:
id: PI:00327
name: peptide rank
id: PI:00216
name: sequest:PeptideRank
Original comment by dcre...@gmail.com
on 1 Dec 2008 at 5:57
Need to show how chemical and post translational modifications are handled with
the
15N example. (DC)
Original comment by dcre...@gmail.com
on 5 May 2009 at 11:01
Report the type of the decoy: reverse, shuffled, concatenated …
Add this to the instance examples.
Original comment by dcre...@gmail.com
on 5 May 2009 at 11:05
No examples of how mzIdentML is tied with mzML. DC to provide one.
Original comment by dcre...@gmail.com
on 5 May 2009 at 11:08
Add an example of <pf:components> is used
Original comment by dcre...@gmail.com
on 5 May 2009 at 11:10
Add an example of Element<Modification>, Substitution modification.
Original comment by dcre...@gmail.com
on 5 May 2009 at 11:11
Element <DatabaseFilters> Include an example of an exclude value.
Original comment by dcre...@gmail.com
on 5 May 2009 at 11:12
Regarding Comment #9:
The Mascot_N15_example.mzid search was against all "Arabidopsis" excluding
"Arabidopsis neglecta":
<DatabaseFilters>
<Filter>
<FilterType>
<pf:cvParam accession="MS:1001020" name="DB filter taxonomy" cvRef="PSI-MS" />
</FilterType>
<Include>
<pf:cvParam accession="NCBI:3701" name="Arabidopsis" cvRef="NCBI-TAXONOMY" />
</Include>
<Exclude>
<pf:cvParam accession="NCBI:45251" name="Arabidopsis neglecta"
cvRef="NCBI-TAXONOMY" />
</Exclude>
</Filter>
</DatabaseFilters>
Original comment by dcre...@gmail.com
on 20 May 2009 at 11:02
Regarding comment #6.
An example is now provided here:
http://code.google.com/p/psi-pi/source/browse/trunk/examples/Mascot_mzml_example
.mzid
This is a search of small.pwiz.1.1.mzML which is referenced from here:
http://psidev.info/index.php?q=node/257
The example shows how the results can be tracked back to the raw data using the
nativeID.
Original comment by dcre...@gmail.com
on 20 May 2009 at 1:34
Regarding comment#8. Example now included in Mascot N15 example
Doesn't use the residues attribute.
Original comment by dcre...@gmail.com
on 20 May 2009 at 3:00
Re-opening this issue:
We don't currently have an example for MS:1001089 and MS:1001090 and they
are not well defined.
[Term]
id: MS:1001089
name: protein taxonomy
def: "The taxonomy of the resultant protein from the search." [PSI:PI]
xref: value-type:xsd\:string "The allowed value-type for this CV term."
is_a: MS:1001085 ! protein result details
[Term]
id: MS:1001090
name: taxonomy nomenclature
def: "The system used to indicate taxonomy. There should be an enumerated
list of options: latin name, NCBI TaxID, common name, Swiss-Prot species ID
(ex. RABIT from the full protein ID ALBU_RABIT)." [PSI:PI]
is_a: MS:1001089 ! protein taxonomy
This seems a little unwieldy to me. How about just having different CV for
each type of ID. For example
<DBSequence id="x" length="449" SearchDatabase_ref="y" accession="z" >
<seq>MGKEKFHINIVVIGHVDSGKSTTTGHLIY...</seq>
<pf:cvParam accession="MS:1001088" name="protein description"
cvRef="PSI-MS" value="Elongation factor..." />
<pf:cvParam accession="MS:xxxxxx1" name="taxonomy: NCBI TaxID"
cvRef="PSI-MS" value="9606" />
<pf:cvParam accession="MS:xxxxxx2" name="taxonomy: common name"
cvRef="PSI-MS" value="human" />
<pf:cvParam accession="MS:xxxxxx3" name="taxonomy: scientific name"
cvRef="PSI-MS" value="Homo sapiens" />
<pf:cvParam accession="MS:xxxxxx4" name="taxonomy: Swiss-Prot ID"
cvRef="PSI-MS" value="HUMAN" />
</DBSequence>
Original comment by andrewro...@googlemail.com
on 10 Jun 2009 at 3:23
Regarding comment #13 - the Mascot examples now include taxonomy information for
protein sequences
Original comment by dcre...@gmail.com
on 12 Jun 2009 at 5:04
Regarding comment #7: Add an example of how <components> are used. Fixed in
Mascot_N15_example.mzid
Original comment by dcre...@gmail.com
on 23 Jun 2009 at 10:16
None of the examples seem to include missedCleavages in <PeptideEvidence>.
Added to
Mascot examples.
Original comment by dcre...@gmail.com
on 23 Jun 2009 at 10:22
None of the examples include a BibliographicReference - added to the
PMF_example.mzid
Original comment by dcre...@gmail.com
on 24 Jun 2009 at 5:12
Just to check for correct use:
The MPC example uses a decoy DB derived from a IPI_human DB.
As DatabaseName the cvParam "IPI_human" (without "_decoy" as part of the name)
and as further CVParams the decoy type and generation details are described.
Original comment by eisena...@googlemail.com
on 26 Jun 2009 at 1:34
Could we have an example with a modification with a neutral loss defined?
Original comment by patri...@matrixscience.com
on 30 Jun 2009 at 2:47
The Mascot_MSMS_example.mzid file now has an example of manual validation:
<ProteinDetectionHypothesis id="PDH_HSP71_RAT"
DBSequence_ref="DBSeq_HSP71_RAT" passThreshold="false">
<PeptideHypothesis PeptideEvidence_Ref="PE_2_1_HSP71_RAT" />
<cvParam accession="MS:1001171" name="mascot:score" cvRef="PSI-MS"
value="40.95" />
<cvParam accession="MS:1001093" name="sequence coverage" cvRef="PSI-MS"
value="2" />
<cvParam accession="MS:1001097" name="distinct peptide sequences"
cvRef="PSI-MS" value="1" />
<cvParam accession="MS:1001125" name="manual validation" cvRef="PSI-MS"
value="Manually rejected this protein for no particular reason. (Example of
manual
validation for mzIdentML)" />
</ProteinDetectionHypothesis>
Original comment by dcre...@gmail.com
on 8 Jul 2009 at 4:56
Regarding comment 19: Could we have an example with a modification with a
neutral
loss defined?
The Mascot 15N example specifies Oxidation, and this can have a neutral loss of:
<umod:NeutralLoss avge_mass="64.1069" composition="H(4) C O S" flag="false"
mono_mass="63.998285">
Unfortunately, this seems to get lost in the conversion of the unimod.xml to
unimod.obo file, so this needs to be fixed. (Please, Martin!)
We should probably also specify, for each peptide match, which neutral loss, if
any,
was 'dominant' and/or used for scoring.
Two pieces of information are required: The position and the loss, and there
can of
course be multiple neutral losses.
For an example sequence of ALNVMESK, we'd want to specify that it is residue 5,
and
the loss is 63.998285
There are 3 possible places for CV terms for this:
1.
<Peptide id="peptide_21_1">
<peptideSequence>ALNVMESK</peptideSequence>
<Modification location="5" residues="M" monoisotopicMassDelta="15.994919">
<cvParam accession="UNIMOD:35" name="Oxidation" cvRef="UNIMOD" />
---> <cpParam accession="MS:1001xxx" name="neutral loss" cvRef="PSI-MS"
value="63.998285" />
</Modification>
</Peptide>
2. Or, it could be with other cv terms directly under:
<SpectrumIdentificationItem id="SII_21_1"
But to cope with multiple pairs of values, this would require a structure
similar to
the one under <Peptide> and therefore a change to the schema
3. In the fragment arrays (remember it's the 5th residue that is methionine):
<SpectrumIdentificationItem id="SII_21_1"
<Fragmentation>
<IonType index="1 2 3 4 5" charge="1">
<cvParam cvRef="PSI-MS" accession="MS:1001224" name="frag: b ion"/>
<FragmentArray values="72.09201 185.11602 299.24156 398.31937 481.32147 "
Measure_ref="m_mz"/>
<FragmentArray values="1.217 0.5765 0.9995 0.3048 0.6491"
Measure_ref="m_intensity"/>
<FragmentArray values="0.0476 -0.0124 0.0702 0.0796 0.0446"
Measure_ref="m_error"/>
---> <FragmentArray values="0, 0, 0, 0, 63.998285"
Measure_ref="m_neutral_loss"/>
</IonType>
And we'd also need to define m_neutral_loss:
<AnalysisData>
<SpectrumIdentificationList id="SIL_1" numSequencesSearched="4944">
<FragmentationTable>
<Measure id="m_m_neutral_loss">
---> <cvParam cvRef="PSI-MS" accession="MS:100xxxx" name="product ion
neutral
loss"/>
</Measure>
I propose #3 - anyone disagree? Perhaps we can discuss at the telecon.
Original comment by dcre...@gmail.com
on 9 Jul 2009 at 8:23
My preference would be to treat a neutral loss as any other modification and go
with
option 1. Is there some information that would be lost by encoding it in this
way?
Original comment by andrewro...@googlemail.com
on 9 Jul 2009 at 8:50
No, I don't think anything is lost. It's more that the <Peptide> is just about
the
peptide sequence and any modified residues rather than about how it fragmented.
However, it's obviously simpler to use option #1. Both options just require one
new
CV term.
Original comment by dcre...@gmail.com
on 9 Jul 2009 at 11:31
Need to change:
<cv id="UNIMOD" URI="http://www.unimod.org/xml/unimod.xml">
to
<cv id="UNIMOD" URI="http://www.unimod.org/obo/unimod.obo">
(Done for all Mascot examples)
Original comment by dcre...@gmail.com
on 9 Jul 2009 at 11:33
Regarding comment 21. We agreed at telecon to keep it simple and go for approach
number 1, with the option of *additionally* using approach #3 for more complex
cases.
Changes made to Mascot examples which use MS:1000336" name="neutral loss"
Original comment by dcre...@gmail.com
on 9 Jul 2009 at 6:10
Some minor fixes needed to instance docs prior to submission:
MPCExample (remove this line)
<!-- CAUTION: ALL experimentalMassToCharge, peptide scores, protein scores and
sequence coverage values are only placeholders for the real values, because
file is
handcrafted and shows only principle structure of AnalysisXML! -->
Mascot MS/MS example:
<userParam name="Mascot User Comment" value="Example Mascot MS-MS search for PSI
AnalysisXML"/>
If Martin is away, I'll manually fix the MPC example for now
Original comment by andrewro...@googlemail.com
on 31 Jul 2009 at 9:28
Regarding comment 26: Mascot example changed.
Original comment by dcre...@gmail.com
on 31 Jul 2009 at 2:15
Original comment by eisena...@googlemail.com
on 10 Jun 2010 at 8:55
Original issue reported on code.google.com by
andrewro...@googlemail.com
on 16 Oct 2008 at 3:57