griffithlab / GenVisR

Genome data visualizations
Creative Commons Zero v1.0 Universal
206 stars 62 forks source link

Ordering Mutational Data by mutburden from high to low within each clinical data (subplot) cohort #379

Open shmashitup opened 2 years ago

shmashitup commented 2 years ago

Is there a way to order all mutation data by tumor mutation burden from high to low? I have divided my data into four cohorts which segregate my mutational data in the clinical data subplot. I would like to order my mutational data within each of these cohorts by tumor mutational burden from high to low. I am not sure how to do this or if it is possible with this package.

zlskidmore commented 2 years ago

Hi @shmashitup

which function are you using? You could probably re-factor the input data.frame to get what you want

shmashitup commented 2 years ago

I can share my code here: if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")

BiocManager::install("GenVisR") install.packages("reshape2")

mutational data

library(GenVisR)

library(reshape2)

mutationData <- read.delim("EC_Waterfall Plot_Mutation Data.txt") mutationData mutationData <- mutationData[,c("patient", "gene.name", "trv.type", "amino.acid.change")] colnames(mutationData) <- c("sample", "gene", "variant_class", "amino.acid.change") mutation_priority <- as.character(unique(mutationData$variant_class)) mutationColours <- c("nonsense"='#4f00A8', "frame_shift_del"='#A80100', "frame_shift_ins"='#CF5A59', "in_frame_del"='#ff9b34', "duplication"='#750054', "delins"='#A80079', "missense"='#009933', "splice_region"='#ca66ae', "deletion"='#888811')

Create an initial plot

mutationHeirarchy<- c("missense", "nonsense", "frame_shift_ins", "frame_shift_del", "delins", "deletion", "duplication", "splice_region") waterfall(mutationData, fileType = "Custom", variant_class_order=mutationHeirarchy, mainPalette=mutationColours)

tumor mutation burden

mutationBurden <- read.delim("EC_mutationburden.txt")

First, let's look at the sample names in the mutationData and mutationBurden

mutationData$sample mutationBurden$sample

Create the waterfall plot

waterfall(mutationData, fileType = "Custom", variant_class_order=mutationHeirarchy, mainPalette=mutationColours, mutBurden=mutationBurden)

reformat clinical data to long format

clinicalData <- read.delim("EC_Clinical Data.txt") clinicalData_2 <- clinicalData[,c(1,2,3,4,5)] colnames(clinicalData_2) <- c("sample", "Cohort", "MSI Comprehensive", "Sex", "Age") clinicalData_2 <- melt(data=clinicalData_2, id.vars=c("sample")) new_samp_order <- as.character(unique(clinicalData_2[order(clinicalData_2$variable, clinicalData_2$value), ]$sample))

create the waterfall plot

waterfall(mutationData, fileType = "Custom", variant_class_order=c("missense", "nonsense", "frame_shift_ins", "frame_shift_del", "delins", "deletion", "duplication", "splice_region"), mainPalette=mutationColours, mutBurden=mutationBurden, clinData=clinicalData_2, clinLegCol=4, clinVarCol=c('POLE Drivers and Secondary Variant'='#ccbadc', 'POLE Drivers Only'='#9975b9', 'POLE Variants Only'='#7a5d94', 'POLE Potential New Drivers'='#5E5161', '0'='#c2ed67', '1'='#e63a27', 'Male'='#90ddee', 'Female'='#649aa6', '21-30'='#E5E8FF','31-40'='#878cfb', '41-50'='#0022ff', '51-60'='#2d41b9', '61-70'='#3d4780', '71-80'='#3a4061', '81-90'='#000000'), clinVarOrder=c('POLE Drivers and Secondary Variant', 'POLE Drivers Only', 'POLE Potential New Drivers', 'POLE Variants Only', '0', '1', 'Male', 'Female', '21-30','31-40', '41-50', '51-60', '61-70', '71-80', '81-90'), section_heights=c(1, 3, 1), sampOrder = new_samp_order)

shmashitup commented 2 years ago

I don't know much about R (I'm a beginner). When you say refactor the input data.frame do you mean the one for the mutation burden? I thought the waterfall() automatically assigns the tumor mutation burden based on the order of the mutational data. How can I manually correct the order? @zlskidmore

zlskidmore commented 2 years ago

if your using a waterfall plot there should be a parameter called sampOrder where you can give it your samples c("samp1", "samp2") etc.

shmashitup commented 2 years ago

I see! Thank you! This makes sense- I can do that!

On Mon, Dec 13, 2021 at 10:54 AM Zachary Skidmore @.***> wrote:

if your using a waterfall plot there should be a parameter called sampOrder where you can give it your samples c("samp1", "samp2") etc.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/griffithlab/GenVisR/issues/379#issuecomment-992616879, or unsubscribe https://github.com/notifications/unsubscribe-auth/AW3ZH62V6FNW5DXARARQY4LUQYJMHANCNFSM5J4S4VFQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.