legumeinfo / glycinemine

An InterMine for Glycine species
GNU Lesser General Public License v3.0
0 stars 1 forks source link

Some genes lack GO annotation when they should have it #9

Closed sammyjava closed 7 years ago

sammyjava commented 7 years ago

Example: this gene lacks the GO annotation corresponding to its family: https://mines.legumeinfo.org/soymine/report.do?id=6097623 While this gene in the same family has GO annotations: https://mines.legumeinfo.org/soymine/report.do?id=6089506 The difference: the first gene has only the following as its gene description from chado: PATATIN-like protein 9 while the second has the GO terms embedded in its description: PATATIN-like protein 9 IPR016035 (Acyl transferase/acyl hydrolase/lysophospholipase) GO:0006629 (lipid metabolic process), GO:0008152 (metabolic process) ***- AT3G63200.1 Clearly using the featureprop gene descriptions from chado is an unreliable way to associate GO terms with genes and should be replaced by a new method. Suggestions, @adf-ncgr ?

adf-ncgr commented 7 years ago

My guess is that although assigned to the family, the gene in fact lacks the requisite domains to be recognized as matching the IPR accession that is the actual source of GO information about the gene. If you look at the gene lacking this characterization in the tree: https://legumeinfo.org/chado_phylotree/phytozome_10_2.59164286?hilite_node=glyma.Glyma.01G022200.1 you will see that it and a friend are in the "sticks out like a sore thumb" category, which is generally a clue that the homology that caused it to be assigned to the family is not as strong as other family members (I have not inspected the MSA, but wouldn't be surprised if these had some large deletions or were fragmentary in some way).

note that the question of appropriate gene family assignment stringency is one that has been asked in the past and is likely to be a bit different in new rounds of gene "familification".

so, I don't think this is an error; descriptors based on BLAST are suggestive but not as careful as interproscan assignments. If you look at domains on the two pages you will see some diffs- e.g. PF01734 which may drive the IPR accession assignment.

sammyjava commented 7 years ago

Ah, gotcha, OK then. I thought the gene family GO terms were copied over to the gene descriptions with reckless abandon, rather than a finer analysis. Thanks for the explanation.