Closed GoogleCodeExporter closed 9 years ago
I like the look of PE being in the sequence collection.
It appears to me that if we want to keep missedCleavages (and I would vote that
we do), that this makes sense to be on SII either as an attribute, a new
element or a cvParam.
For all simple cases, I think this model holds up, for complex cases with
multiple enzymes, we may need to model which enzyme this refers to. Although
this is probably equally not covered in the 1.0 schema.
"It is also valid to just provide a Peptide_ref without any PE
refs in a SII. In such a case, the API would generate a list of all
possible PEs associated with this SII."
Intuitively I don't like the sound of this, some file writers would produce
PEs, others would not. If all this can be inferred by an API, the argument goes
that PE is not needed at all. However, we do not enforce that the protein
sequence be reported (since for some output formats this is not always possible
without the searched database) so an API would not be able to infer pre, post
or position. I would prefer that PE must be reported for all valid peptide to
protein matches by the file writer
"Protein inference should now be handled completely at the
ProteinDetectionHypothesis (PDH) level. Therefore, the
PeptideDetectionHypothesis was adapted to hold a number of 1..n SII_refs
(together with the PE ref as attribute). This would resemble the
statement that the PDH is backed up by this PE identified through the
following SIIs."
Generally I agree with linking to SIIs from PDH. I'm coming round to the idea
of also including the PE_ref, as a quick link to get to non redundant peptides
identified without going via all SIIs. It makes a bit more work for writers but
for some use cases, saves work for file readers. If we stick with this though,
again I think PE cannot be optional.
Original comment by a...@cuckundoorecords.com
on 14 Feb 2011 at 11:14
"Intuitively I don't like the sound of this, some file writers would produce
PEs, others would not. If all this can be inferred by an API, the argument goes
that PE is not needed at all. However, we do not enforce that the protein
sequence be reported (since for some output formats this is not always possible
without the searched database) so an API would not be able to infer pre, post
or position. I would prefer that PE must be reported for all valid peptide to
protein matches by the file writer"
This was meant differently. Not PEs are optional but the PE references from SII
to PE thus PE_refs. As all PEs link to a Peptide the PEs that a SII refers to
can be inferred from the Peptide(_ref). It is still mandatory to provide all
possible PEs.
"Generally I agree with linking to SIIs from PDH. I'm coming round to the idea
of also including the PE_ref, as a quick link to get to non redundant peptides
identified without going via all SIIs. It makes a bit more work for writers but
for some use cases, saves work for file readers. If we stick with this though,
again I think PE cannot be optional."
This is exactly our current proposal. The PeptideHypothesis contains the PE_ref
as attribute and a list of SII_refs as child elements (since several SIIs can
link to the same PE). In this list, only the SIIs that were used for scoring
should be included.
If we want to keep enzyme specific information I would not want to put them
into SII. Even though this might be convenient at the moment it is not
reflecting the nature of the information. Basically, it is part of a Peptide's
properties in respect to a certain protein, thus would go at the PE level. In
my opinion parameters or sub-elements with a reference to the respective
enzymes seems more suited.
Original comment by johannes...@gmail.com
on 14 Feb 2011 at 12:32
As discussed in the previous mzIdentML conference we updated the schema
proposal to solve the problem of enzyme specific information at the
PeptideEvidence (PE) level. A new element was created under
"SequenceCollection" called "PeptideEvidenceList" (PEList). These 1:n PELists
contains 1:n PEs plus 0:n enzyme references (and optional cv / user
parameters). Furthermore, EnzymeType was changed to be an extension of
"IdentifiableType".
If a protocol with two enzymes (A and B) is being used PEs can now be
grouped according to the enzyme(s) they come from. F.e. all PEs from
peptide A, all PEs from enzyme B and a third PEList for all PEs where
it's not sure if they come from A or B (this list then contains two
references).
As enzyme specific information should now no longer be a problem at the
PE level the previously removed attribute "missedCleavages" was added
again.
Additionally, we simplified the names of several elements removing the
f.e. "PSI....." part from the beginning of the name. At last, we changed
"SearchModificationType" as proposed in the last call. ModParam was
removed and all attributes as well as the cvParam were added to
"SearchModificationType". The multiplicity of cvParam was furthermore
changed to 1:n.
The proposed schema was added to the repository:
http://code.google.com/p/psi-pi/source/browse/trunk/schema/mzIdentML1.1.0.xsd
Original comment by johannes...@gmail.com
on 28 Feb 2011 at 4:08
[deleted comment]
Original comment by eisena...@googlemail.com
on 3 Apr 2011 at 2:53
agreed at Heidelberg
Original comment by eisena...@googlemail.com
on 12 Apr 2011 at 9:14
Original issue reported on code.google.com by
johannes...@gmail.com
on 11 Feb 2011 at 3:33Attachments: