vogelwk / psi-pi

Automatically exported from code.google.com/p/psi-pi
0 stars 0 forks source link

Taxonomy for each protein (DBSequence) #50

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
We don't currently have an example for MS:1001089 and MS:1001090 and they
are not well defined. 

[Term]
id: MS:1001089
name: protein taxonomy
def: "The taxonomy of the resultant protein from the search." [PSI:PI]
xref: value-type:xsd\:string "The allowed value-type for this CV term."
is_a: MS:1001085 ! protein result details

[Term]
id: MS:1001090
name: taxonomy nomenclature
def: "The system used to indicate taxonomy. There should be an enumerated
list of options: latin name, NCBI TaxID, common name, Swiss-Prot species ID
(ex. RABIT from the full protein ID ALBU_RABIT)." [PSI:PI]
is_a: MS:1001089 ! protein taxonomy

This seems a little unwieldy to me. How about just having different CV for
each type of ID. For example

<DBSequence id="x" length="449" SearchDatabase_ref="y" accession="z" >
  <seq>MGKEKFHINIVVIGHVDSGKSTTTGHLIY...</seq>
  <pf:cvParam accession="MS:1001088" name="protein description"
cvRef="PSI-MS" value="Elongation factor..." />
  <pf:cvParam accession="MS:xxxxxx1" name="taxonomy: NCBI TaxID"     
cvRef="PSI-MS" value="9606" />
  <pf:cvParam accession="MS:xxxxxx2" name="taxonomy: common name"    
cvRef="PSI-MS" value="human" />
  <pf:cvParam accession="MS:xxxxxx3" name="taxonomy: scientific name"
cvRef="PSI-MS" value="Homo sapiens" />
  <pf:cvParam accession="MS:xxxxxx4" name="taxonomy: Swiss-Prot ID"  
cvRef="PSI-MS" value="HUMAN" />
</DBSequence>

It should be possible to have 0..many taxonomy definitions for each sequence.

Original issue reported on code.google.com by dcre...@gmail.com on 13 May 2009 at 6:05

GoogleCodeExporter commented 8 years ago
I vote for the option with different CV terms, all with a is_a: MS:1001089 
relationship.

But I would prefer to see the latter be renamed molecule taxonomy, as it can be 
a
result from a search in a nucleotide sequence database and not a protein 
sequence
database (some tools do on-the-fly translation of nucleotide sequences)

Original comment by pierreal...@gmail.com on 13 May 2009 at 8:16

GoogleCodeExporter commented 8 years ago
MS:1001090 is obsolete now. The terms suggested by David are added to the CV. 
This
issue can be closed.

Original comment by a.bertsc...@googlemail.com on 22 May 2009 at 8:01

GoogleCodeExporter commented 8 years ago

Original comment by eisena...@googlemail.com on 28 May 2009 at 3:46