vogelwk / psi-pi

Automatically exported from code.google.com/p/psi-pi
0 stars 0 forks source link

retention time should be allowed in files, specification document currently suggests otherwise #74

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Email thread copied from list:

Hi Sean,

Good point – this is clearly an error in the specification document, 
retention time is certainly allowed to be present in a PSM and is used in this 
way in various files. The specification document is in a way superseded by what 
is allowed or not by the validator / mapping file combination – and rt is 
allowed. The latest validator can be downloaded from here: 
http://code.google.com/p/psi-pi/downloads/list

There is also an online validator here: 
http://www-bs2.informatik.uni-tuebingen.de/services/OpenMS/analysisXML/index.php
.

Let me know if you encounter any problems using these.

I will post your message to the google code issues list so we remember to 
update the specification document, when we have other changes to make,

I guess Gerhard may be able to comment on the duplication of rt terms.
Best wishes
Andy

From: Seymour, Sean L [mailto:Sean.Seymour@absciex.com] 
Sent: 23 January 2013 01:40
To: psidev-pi-dev@lists.sourceforge.net
Subject: [Psidev-pi-dev] Proposed change to mzIdentML specification: allow RT 
to be included

Hi all

We’ve seen some mzIdentML files that are using cvParam MS:1001114 to capture 
the retention time of peptides, which seems perfectly reasonable to me. 
However, it turns out the specification currently forbids this as being 
unnecessary duplication:

“5.1.5 Exclusion of information relating to mass spectral data
It has been decided that the peak list that was searched should remain external 
to the format, for example 
referenced as an mzML file. Similarly other data items that may be used during 
a search, but can be retrieved 
from the source spectra file are not duplicated in mzIdentML, such as retention 
time.” 

As a practical matter, I think this is an unfortunate decision because it means 
the user has to keep the peak lists or other mass spec data around with the 
mzIdentML file and consuming software has to parse more files just for a tiny 
piece of information like this. It seems a little inconsistent because there is 
no issue with duplicating things like the observed mass to charge and charge 
state, which could also be taken from the original data, given coordinates to a 
spectrum. The reason m/z and z are included directly is obviously because they 
are considered critical attributes of an ID. I think most people who have done 
any kind of targeted proteomics would agree that rt is also critical 
information. Thus, I think this should actually be changed in the 
specification, at least to consider it an optional valid use of the format to 
include rt information as is already being done by some exporters producing 
mzIdentML. I would predict most users would be in favor of this as well, over 
the alternative of having to lug around all the peak lists, at least for this 
use case. I suspect this is why some exporters have already done what they did. 
Thoughts on this?

Separately, there also seems to be an issue with two apparently equivalent 
terms for retention time: MS:1001114 “retention time(s)” and MS:1000894 
“retention time”. The definitions don’t clearly distinguish themselves, 
if they were actually meant to be different things. I assume this is just an 
error?

Sean

Original issue reported on code.google.com by andrewro...@googlemail.com on 23 Jan 2013 at 3:24

GoogleCodeExporter commented 8 years ago
I actually commented back in 2011 on the retention time issue in another issue 
that was set as "Fixed" so I guess nobody ever looked at it:

"Should retention time be obsoleted and replaced by scan start time? I find it 
odd that mzML and mzIdentML use different terms to mean the same thing. For 
mzML we agreed that retention/elution time is a peptide/compound chromatography 
property, not a spectrum property."

MS:1001114 ("retention times(s)") is an isolated term created when mzIdentML 
had a separate CV but was possibly kept around for use with mzQuantML as well? 
I have seen mzIdentMLs using this term.

MS:1000894 ("retention time") is the root of a hierarchy of terms for TraML. It 
is intended to describe peptides, not spectra. I think it is well suited to be 
used in mzQuantML as well, and possibly in the peptide section of mzIdentML as 
well if appropriate (but not SpectrumIdentificationResults). I don't think I've 
seen any mzIdentMLs using this family of terms.

MS:1000016 ("scan start time") is the PSI-approved term for describing when a 
scan started acquisition. A spectrum could be comprised of multiple scans, but 
mzIdentML's SIR doesn't really care about that, so the SIR could either take 
the first scan time or all of them. This is the term we use in pwiz when 
converting from pepXML to mzIdentML. I still think that it's not precise (and 
thus potentially confusing) to use retention time when talking about spectra.

Original comment by matt.cha...@gmail.com on 23 Jan 2013 at 5:53

GoogleCodeExporter commented 8 years ago
Discussed at PSI2013

Decision to make an update to the specification document, clarifying that 
retention time can be reported at the <SpectrumIdentificationResult level. 

We should be using the same CV terms that can be annotated on mzML spectra. 
Please can someone comment on most appropriate term(s) to use, since some 
search engines may be combining multiple scans - correct?

Original comment by andrewro...@googlemail.com on 17 Apr 2013 at 1:20

GoogleCodeExporter commented 8 years ago
I forgot to add the action item to this

ACTION: Gerhard to coordinate with mzML CV terms and make sure we can map from 
SIR.

Original comment by andrewro...@googlemail.com on 17 Apr 2013 at 1:53

GoogleCodeExporter commented 8 years ago
Hi all,

if I understood Matt correctly, he proposes to use the same term for mzML and 
mzIdentML. This means we have to change scan start time so that it can also be 
used in mzIdentML to 

[Term]
id: MS:1000016
name: scan start time
def: "The time that an analyzer started a scan, relative to the start of the MS 
run." [PSI:MS]
xref: value-type:xsd\:float "The allowed value-type for this CV term."
is_a: MS:1000503 ! scan attribute
is_a: MS:1001105 ! peptide result details
is_a: MS:1001405 ! spectrum identification result details
relationship: has_units UO:0000010 ! second
relationship: has_units UO:0000031 ! minute

Then we can obsolete the term
[Term]
id: MS:1001114
name: retention time(s)
def: "Retention time of the spectrum from the source file." [PSI:PI]
xref: value-type:xsd\:double "The allowed value-type for this CV term."
is_a: MS:1001105 ! peptide result details
is_a: MS:1001405 ! spectrum identification result details
relationship: has_units UO:0000010 ! second
relationship: has_units UO:0000031 ! minute

and should use the following term and it's childs only in TraML
[Term]
id: MS:1000894
name: retention time
def: "A measure of the interval relative to the beginning of a mass 
spectrometric run when a peptide will exit the chromatographic column." [PSI:MS]
is_a: MS:1000887 ! peptide attribute

Original comment by germa64m...@gmail.com on 30 Apr 2013 at 8:00

GoogleCodeExporter commented 8 years ago
Agree with Gerhard's comment, we wish to use "scan start time" at the level of 
SpectrumIdentificationResult

Potentially allow this term to be repeated with different values to allow for 
case where multiple MS2 spectra have been combined - all agree?

Can discuss on today's call.

Original comment by andrewro...@googlemail.com on 16 May 2013 at 2:45

GoogleCodeExporter commented 8 years ago

Original comment by germa64m...@gmail.com on 2 Jul 2013 at 3:00