pkerpedjiev / gene-citation-counts

GNU General Public License v3.0
33 stars 7 forks source link

Were ncRNA genes taken into account? #4

Closed AntonPetrov closed 6 years ago

AntonPetrov commented 6 years ago

Hi Peter! Great work on the Nature paper. Just curious if the analysis included ncRNA genes? Searching in the article for RNA found nothing, which was a bit surprising. Thank you!

pkerpedjiev commented 6 years ago

Thanks Anton! Yes, they were indeed taken into account.

If you're interested, you can find the full list here: https://www.dropbox.com/s/n72zdeunla667z8/gene_info_total_human.tsv?dl=0

The type of gene is in the 6th column of that file:

pete@twok:~/Dropbox/tmp $ grep -i rna gene_info_total_human.tsv | head
9606    406991  -   MIR21   microRNA 21 ncRNA   870
9606    60489   -   APOBEC3G    apolipoprotein B mRNA editing enzyme catalytic subunit 3G   protein-coding  566
9606    406938  -   MIR146A microRNA 146a   ncRNA   481
9606    5430    -   POLR2A  RNA polymerase II subunit A protein-coding  440
9606    406947  -   MIR155  microRNA 155    ncRNA   439
9606    1994    -   ELAVL1  ELAV like RNA binding protein 1 protein-coding  389
9606    407040  -   MIR34A  microRNA 34a    ncRNA   375
9606    2521    -   FUS FUS RNA binding protein protein-coding  339
9606    406937  -   MIR145  microRNA 145    ncRNA   323
9606    2130    -   EWSR1   EWS RNA binding protein 1   protein-coding  258
AntonPetrov commented 6 years ago

Thank you Peter! I grep-ed the file for ncRNA and got the top list. It's interesting that 9 out of top 10 are miRNAs.

pkerpedjiev commented 6 years ago

Yeah, I suppose that's not too surprising. The only other ones I'd expect to be more popular would be rRNA but I've always found them hard to find on the genome. Maybe it's because there's multiple copies.