vogelwk / psi-pi

Automatically exported from code.google.com/p/psi-pi
0 stars 0 forks source link

Modifications that cannot be found in UniMod #57

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
> Modifications that cannot be found in UniMod
> -------------------------------------------------------------
> It is not a good idea to allow exporters to have a 'wild card' to
> report 'undetermined' modifications. The new terms should be added to
> UniMod (or other ontology such as PSI-MOD) before the exporter is
> written.

This just seems crazy to me. I'm happy to require that known mods MUST be 
annotated with their Unimod term, but that's no reason to completely exclude 
representation for unknown mods! It's ridiculous to require that new mods be 
identified and added to Unimod before they can be written to an intermediate 
mzIdentML file.

Now if you want to talk about MIAPE-PI, i.e. that journals won't accept 
unidentified modifications in mzIdentML, that's a different story. I think it's 
reasonable to have identified and verified a modification before submitting it 
for publication.

I also think it's reasonable (and relatively simple) for the semantic validator 
to check that every mod mass that can snap (within some tolerance) to one or 
more Unimod masses has been snapped.

Original issue reported on code.google.com by matt.cha...@gmail.com on 11 Mar 2011 at 5:08

GoogleCodeExporter commented 8 years ago
Hi Matt,

I certainly sympathise with your viewpoint, but to me it comes down to what a 
"semantically valid" file is used for. I think it's acceptable to create 
intermediate files that for example trigger a warning that a invalid CV term 
has been entered e.g. cvParam name="UNIMOD:Unknown" for the cases where a new 
mod is needed in the CV but that a truly semantically valid file must have all 
CV terms intact.

It seems to me that way round is easier than warning the user that a mod mass 
has been entered that already exists in Unimod - since we can't know from the 
mass alone, that this is the mod the user was intending. 

If we allow the get-out of a UserParam or "unknown mod" cv term, I don't see 
how we ever capture these values and improve the CV. 99% of all files will 
capture known modifications, so my preference would be to force exporters to 
include CV terms wherever possible, and if a new mod is discovered - write an 
intermediate file until an accession has been given. 

I'm happy to discuss furthers either on the call this week or at the PSI 
meeting?

Original comment by andrewro...@googlemail.com on 21 Mar 2011 at 3:53

GoogleCodeExporter commented 8 years ago
The (unsubstantiated) 99% figure is at the very least inapplicable for sequence 
tag based (error tolerant) searches. I don't understand why an mzIdentML file 
representing the actual modification masses used in a blind PTM search should 
be semantically invalid. Sure, most of the "unknown" masses are bogus, but 
that's a job for PSM confidence assessment, not the semantic validator.

I still agree that the semantic validator could be tasked with finding Unimod 
masses very close to the "unknown" masses and giving warnings for those.

Original comment by matt.cha...@gmail.com on 24 Mar 2011 at 3:23

GoogleCodeExporter commented 8 years ago
This was partially resolved on a previous call when we decided to add an 
"unknown modification mass" term to the PSI-MS CV. The semantic validator will 
check that a unimod term is used if any are within a tolerance to the 
modification mass. If none are within tolerance, then the document should use 
the "unknown" CV term. The tolerance is the part we haven't figure out. Some of 
us suggested using precursor m/z tolerance. Is there an argument in favor of 
using something else? I suppose a sequence tag based search might not want to 
provide a precursor m/z tolerance, but I think having one is a fair requirement.

Original comment by matt.cha...@gmail.com on 11 Apr 2011 at 2:27

GoogleCodeExporter commented 8 years ago
The term is not intended for "quick hacks" or "lazy implementers".
It is not recommended to use the "unknown modification" term. This should go 
into the specification document. PRIDE will flag data sets with "such terms".
Putting it into the semantic validator seems  technically not possible with the 
current mechanism.
Is the user is asked interactively, that may cause problems due to ambiguities.
Action Points: 1) put the above text into spec. doc.; 2) create CV term in 
PSI-MS 

Original comment by eisena...@googlemail.com on 12 Apr 2011 at 9:34

GoogleCodeExporter commented 8 years ago
You didn't mention the Unimod snapping and especially the tolerance to use. If 
a writer uses a smaller tolerance than the validator, invalid files will ensue.

Original comment by matt.cha...@gmail.com on 12 Apr 2011 at 1:33

GoogleCodeExporter commented 8 years ago
agreement TeleCon 21.04.2011: Stating an allowed tolerance for pipeline tools 
is not the task of the standard format definition. Actually the submitter / 
file owner EITHER knows his modification, then he can be asked during the 
converison or by the PRIDE (or other repositiry) team; OR he really searches 
for a mass, which is currently an unknown modification. 
If a validator uses a tolerance, it is documented, so writers may adopt to this.

Action points see comment 4: I will 1) add text to current spec. doc or mail 
Andy about that and 2) ask David / Juan to add such a term).

Original comment by eisena...@googlemail.com on 21 Apr 2011 at 4:18

GoogleCodeExporter commented 8 years ago
Stating how the validator should pick a tolerance to use is not the task of the 
standard format definition?

Original comment by matt.cha...@gmail.com on 21 Apr 2011 at 4:21

GoogleCodeExporter commented 8 years ago
correction to comment 6 / action point 2: "id: MS:1001460 name: unknown 
modification"
already exists

Original comment by eisena...@googlemail.com on 21 Apr 2011 at 5:22