stephenturner / annotables

R data package for annotating/converting Gene IDs
http://www.gettinggeneticsdone.blogspot.com/2015/11/annotables-convert-gene-ids.html
161 stars 34 forks source link

Strip technical info from the description field of gene annotations #20

Closed mdozmorov closed 2 years ago

mdozmorov commented 2 years ago

Thanks, Stephen, for this package, I've been using it for ages. This PR strips technical info (e.g., "[Source:HGNC Symbol;Acc:HGNC:233]") from the description field of the gene annotation objects as it is not very informative. Please, close if not necessary.

Results are the same, only the description field is cleaner:

          ensgene entrez symbol chr     start       end strand        biotype   description
1 ENSG00000000003   7105 TSPAN6   X 100627108 100639991     -1 protein_coding tetraspanin 6
2 ENSG00000000005  64102   TNMD   X 100584936 100599885      1 protein_coding   tenomodulin