Open joaoe opened 8 years ago
The trouble with BCR and TCR genes is that they don't actually code for anything before recombination by Rag. The types which seem possibly more interesting for effect prediction are (1) non-stop decay & NMD genes, (2) polymorphic pseudogenes. However, they're both trickier to handle since you're probably predicting an effect which won't manifest in much or any actual protein product.
which won't manifest in much or any actual protein product.
The same could be said for regular coding transcripts that are not expressed :) Perhaps if something is added by https://github.com/hammerlab/varcode/issues/195 then this issue can be ignored.
Currently, after the biotype cleanup, only the biotype "protein_coding" is used in the check in
Transcript.is_protein_coding()
.Looking at this list http://www.ensembl.org/Help/Glossary?id=275 confuses me a bit, since nontranslating_CDS or polymorphic_pseudogene are included.
Perhaps the list in
Transcript.is_protein_coding()
should be extended to include IG_gene, TR_gene, non_stop_decay, nonsense_mediated_decay and protein_coding ?