lawremi / ggbio

Grid and ggplot2 based visualization for biological data
111 stars 24 forks source link

UTRs are overplotted by exons #148

Open pfh opened 3 years ago

pfh commented 3 years ago

When autoploting from an EnsDb or TxDb, UTR, CDS and exon features are shown as rectangles. This is redundant for protein coding genes, since UTR and CDS features cover all exons. Exons are drawn at the same height as CDS features, obscuring thinner UTR features.

The problem can be seen in various plots in the ggbio vignette, such as the UTRs of BRCA1 not being shown thinner than the CDS in section 2.2.2.

Setting alpha<1 shows what is going on.

library(ggbio)
library(GenomicRanges)
library(EnsDb.Hsapiens.v75)
ensdb <- EnsDb.Hsapiens.v75
gr <- GRanges(seqnames=16, IRanges(30768000, 30770000), strand="+")

autoplot(ensdb, gr, names.expr="gene_name", alpha=0.3)

ggbio-transparent

I am using ggbio version 1.41.0.

lawremi commented 3 years ago

Thanks for the report. @sanchit-saini might be able to give some insight into this.

sanchit-saini commented 3 years ago

Yes, it obscuring the plot in the case of protein-coding genes. However, this might be useful in non-protein coding genes. Therefore, plotting a different plot for protein-coding genes would be ideal.

I'm not sure how the plot should look? (without exons or maybe with exons with different alpha values?) Is there any unique identifier for protein-coding genes? If not then maybe a parameter can be added in the autoplot function definition to distinguish between non-protein and protein-coding genes.

lawremi commented 3 years ago

Without exons. We know which are coding because the TxDb provides CDS regions for them. If there is no CDS, we can treat a transcript as non-coding.