Closed GoogleCodeExporter closed 9 years ago
Hi Matt,
I certainly sympathise with your viewpoint, but to me it comes down to what a
"semantically valid" file is used for. I think it's acceptable to create
intermediate files that for example trigger a warning that a invalid CV term
has been entered e.g. cvParam name="UNIMOD:Unknown" for the cases where a new
mod is needed in the CV but that a truly semantically valid file must have all
CV terms intact.
It seems to me that way round is easier than warning the user that a mod mass
has been entered that already exists in Unimod - since we can't know from the
mass alone, that this is the mod the user was intending.
If we allow the get-out of a UserParam or "unknown mod" cv term, I don't see
how we ever capture these values and improve the CV. 99% of all files will
capture known modifications, so my preference would be to force exporters to
include CV terms wherever possible, and if a new mod is discovered - write an
intermediate file until an accession has been given.
I'm happy to discuss furthers either on the call this week or at the PSI
meeting?
Original comment by andrewro...@googlemail.com
on 21 Mar 2011 at 3:53
The (unsubstantiated) 99% figure is at the very least inapplicable for sequence
tag based (error tolerant) searches. I don't understand why an mzIdentML file
representing the actual modification masses used in a blind PTM search should
be semantically invalid. Sure, most of the "unknown" masses are bogus, but
that's a job for PSM confidence assessment, not the semantic validator.
I still agree that the semantic validator could be tasked with finding Unimod
masses very close to the "unknown" masses and giving warnings for those.
Original comment by matt.cha...@gmail.com
on 24 Mar 2011 at 3:23
This was partially resolved on a previous call when we decided to add an
"unknown modification mass" term to the PSI-MS CV. The semantic validator will
check that a unimod term is used if any are within a tolerance to the
modification mass. If none are within tolerance, then the document should use
the "unknown" CV term. The tolerance is the part we haven't figure out. Some of
us suggested using precursor m/z tolerance. Is there an argument in favor of
using something else? I suppose a sequence tag based search might not want to
provide a precursor m/z tolerance, but I think having one is a fair requirement.
Original comment by matt.cha...@gmail.com
on 11 Apr 2011 at 2:27
The term is not intended for "quick hacks" or "lazy implementers".
It is not recommended to use the "unknown modification" term. This should go
into the specification document. PRIDE will flag data sets with "such terms".
Putting it into the semantic validator seems technically not possible with the
current mechanism.
Is the user is asked interactively, that may cause problems due to ambiguities.
Action Points: 1) put the above text into spec. doc.; 2) create CV term in
PSI-MS
Original comment by eisena...@googlemail.com
on 12 Apr 2011 at 9:34
You didn't mention the Unimod snapping and especially the tolerance to use. If
a writer uses a smaller tolerance than the validator, invalid files will ensue.
Original comment by matt.cha...@gmail.com
on 12 Apr 2011 at 1:33
agreement TeleCon 21.04.2011: Stating an allowed tolerance for pipeline tools
is not the task of the standard format definition. Actually the submitter /
file owner EITHER knows his modification, then he can be asked during the
converison or by the PRIDE (or other repositiry) team; OR he really searches
for a mass, which is currently an unknown modification.
If a validator uses a tolerance, it is documented, so writers may adopt to this.
Action points see comment 4: I will 1) add text to current spec. doc or mail
Andy about that and 2) ask David / Juan to add such a term).
Original comment by eisena...@googlemail.com
on 21 Apr 2011 at 4:18
Stating how the validator should pick a tolerance to use is not the task of the
standard format definition?
Original comment by matt.cha...@gmail.com
on 21 Apr 2011 at 4:21
correction to comment 6 / action point 2: "id: MS:1001460 name: unknown
modification"
already exists
Original comment by eisena...@googlemail.com
on 21 Apr 2011 at 5:22
Original issue reported on code.google.com by
matt.cha...@gmail.com
on 11 Mar 2011 at 5:08