griffithlab / GenVisR

Genome data visualizations
Creative Commons Zero v1.0 Universal
209 stars 62 forks source link

how to cut down the querying biomart time #346

Open gadepallivs opened 6 years ago

gadepallivs commented 6 years ago

I have a simple interface in R-shiny that allows a user to select top 10 genes with mutations in his data list and plot a lollipop plot. But, every query input of the gene the lollipop functions run through the following steps...is there a way to avoid quering for everygene and quicken the plot loading process ?

Querying biomaRt for transcript sequence
Querying biomaRt for protein domains
Constructing gene track
Detected p. notation for amino_acid_change
applying force field to observed mutations for top track. 
This will take time if n is large, see vignette for tips 
zlskidmore commented 6 years ago

With some minor modifications to the existing code base it should be possible to allow a user to supply a mart object on which to query, this would only save 4-5 seconds per gene/transcript though. By far the most cpu intensive step is the force field model which pulls and repulses the points.

I've actually completely rewritten this function to get rid of the force field model, it is much more effcient however the look has changed as well. You could try that, unfortunately i've not had a chance to completely document it yet, however if you want to give it a try you could run this to try it out (note the uppercase L in Lolliplot()):

# load data
library(ggplot2)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(BSgenome.Hsapiens.UCSC.hg38)
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
BSgenome <- BSgenome.Hsapiens.UCSC.hg38

# use the PIK3CA internal GenVisR dataset
keep <- c("Chromosome", "Start_Position", "End_Position", "Reference_Allele",
          "Tumor_Seq_Allele2", "Tumor_Sample_Barcode", "Gene", "Variant_Classification")
dfObject.mode1 <- PIK3CA[,keep]
colnames(dfObject.mode1) <- c("chromosome", "start", "stop", "reference", "variant",
                        "sample", "gene", "consequence")

# run Lolliplot
Lolliplot.out <- suppressWarnings(Lolliplot(dfObject.mode1, transcript="ENST00000263967",
                                                species="hsapiens", host="www.ensembl.org",
                                                txdb=txdb, BSgenome=BSgenome, emphasize=NULL,
                                                DomainPalette=NULL, MutationPalette=NULL,
                                                labelAA=TRUE, plotALayers=NULL, plotBLayers=NULL,
                                                sectionHeights=NULL, verbose=FALSE))
# draw Lolliplot
drawPlot(Lolliplot.out)
karenlawwc commented 2 years ago

@gadepallivs Hi I have a similar problem with doing the R-shiny that allows a user to genes with mutations in his data list and plot a lollipop plot. May I ask how you did it or can you share some of the code with me? I keep encountering this error "Detecting more than 1 transcript input in x" while the same command actually just work on R studio. Any help is appreciated and thank you!!