ncborcherding / scRepertoire

A toolkit for single-cell immune profiling
https://www.borch.dev/uploads/screpertoire/
MIT License
306 stars 52 forks source link

circlize chordDiagram for compareClonotypes? and control color in Alluvial plot? #39

Closed gt7901b closed 3 years ago

gt7901b commented 3 years ago

Hi, Nick:

I wonder if there's a way to instead of Alluvial plot in compareClonotypes, but use circlize chordDiagram? For example, I have 4 different tissues and clones may overlap. Alluvial can only do tissue 1 through 4 sequentially. Maybe circlize can show overlap from 1 to 4 to 1?

Also, is there a way to set color in Alluvial plot from compareClonotypes? I have a top clone(by proportion) in patient 1 tumor tissue, when I use compareClonotypes to compare 4 different tissues. Then, I compared 4 different T cell clusters, the top clone showed a different color. Is there a way to set the color for top clones? How the order of the clones in Alluvial plot was set? top clone is not always on the top.

thanks for your time

ncborcherding commented 3 years ago

Hey gt7901b,

Have you checked out the new function getCircilize()? This allows for the comparison of unique clonotypes between clusters. As long as the tissues are in you meta data, you should be able to call using getCircilize(seurat, groupBy = "tissue")

Screen Shot 2020-10-27 at 3 31 27 PM

Also, is there a way to set color in Alluvial plot from compareClonotypes? Yes, you can add an additional line after compareClonotype() to format whatever you'd like. This is the case for whatever you'd like to do with ggplot parameters, like change the theme. So if I wanted to change to specific colors I would:

compareClonotype() +
scale_color_manual(values = c(vector of colors))

How the order of the clones in Alluvial plot was set? top clone is not always on the top.

When I designed the function the package ggalluvial was automatically ordering by number/proportion. It looks like that is no longer the case, here is the current call in the vignette. I will do some investigating and get that fixed.

download

Let me know if you have any other questions.

Thanks, Nick

gt7901b commented 3 years ago

Hi, Nick:

Does it have to be a seurat object in getCircilize()? Is it possible to use combined from the combineTCR function with samples parameter as tissues ?

gt7901b commented 3 years ago

I tried to control color, but this does not work compareClonotypes(combined, numbers = 5, cloneCall="aa", graph = "alluvial")+ scale_color_manual(values = c("#FF4B20", "#FFB433", "#C6FDEC", "#7AC5FF", "#0348A6"))

ncborcherding commented 3 years ago

Does it have to be a seurat object in getCircilize()?

Not as of yet in the function itself, but below is a small modification on the parameters and first line so you can perform the analysis with the combined object. You will have to track down a couple of internal functions, like theCall() and save them as well to your global environment. But you can just search the repository for those and copy-paste

getCirclize.mod <- function(combined, cloneCall = "gene+nt", 
                        groupBy = NULL, proportion = FALSE) {
     meta <- bind_rows(combined)
    cloneCall <- theCall(cloneCall)
    test <- meta[, c(cloneCall, groupBy)]
    dTest <- dcast(test, test[,cloneCall] ~ test[,groupBy])
    dTest <- dTest[apply(dTest[,-1], 1, function(x) !all(x==0)),]
    dTest <- dTest[-1]
    total <- nrow(dTest)
    matrix_out <- matrix(ncol = ncol(dTest), nrow = ncol(dTest))
    for (x in seq_len(ncol(dTest))) {
        for (y in seq_len(ncol(dTest)) ){
            matrix_out[y,x] <- length(which(dTest[,x] >= 1 & dTest[,y] >= 1))
        }
    }
    colnames(matrix_out) <- colnames(dTest)
    rownames(matrix_out) <- colnames(dTest)

    #Need to subtract extra cells - will take the difference of the sum of the 
    #column minus and the respective cell and subtract that from the respective cell
    for (y in seq_len(ncol(matrix_out))) {
        matrix_out[y,y] <- matrix_out[y,y] - (sum(matrix_out[,y])-matrix_out[y,y])
        if (matrix_out[y,y] < 0) {
            matrix_out[y,y] <- 0
        }
    }
    output <- data.frame(from = rep(rownames(matrix_out), 
                        times = ncol(matrix_out)),
                        to = rep(colnames(matrix_out), each = nrow(matrix_out)),
                        value = as.vector(matrix_out),
                        stringsAsFactors = FALSE)
    # Reorder columns to eliminate redundant comparisons
    for (k in seq_len(nrow(output))) {
        max <- order(output[k,1:2])[1] #which is first alphabetically
        max <- output[k,max]
        min <- order(output[k,1:2])[2] #which is second alphabetically
        min <- output[k,min]
        output[k,1] <- max
        output[k,2] <- min
    }
    unique <- rownames(unique(output[,1:2])) #removing redundant comparisons
    output <- output[rownames(output) %in% unique, ]
    if (proportion == TRUE) {
        output$value <- output$value/total
    } 
    return(output)
}

I tried to control color, but this does not work

I think I mistyped, have you tried scale_fill_manual() instead of scale_color_manual()?

compareClonotypes(combined, numbers = 5, cloneCall="aa", graph = "alluvial")+
scale_fill_manual(values = c("#FF4B20", "#FFB433", "#C6FDEC", "#7AC5FF", "#0348A6"))
gt7901b commented 3 years ago

Thanks, Nick. The color control worked. As for circlize, I run circles<-getCirclize.mod(combined, groupBy='sample'), have the following out put

Using sample as value column: use value.var to override. Aggregation function missing: defaulting to length A data.frame: 10 × 3 from to value

1 Normal Normal 1399 2 Normal Tumor 399 3 Normal LymphNode 115 4 Normal PBMC 150 6 Tumor Tumor 457 7 Tumor LymphNode 92 8 Tumor PBMC 100 11 LymphNode LymphNode 8 12 LymphNode PBMC 44 16 PBMC PBMC 0 when I tried grid.cols <- hue_pal()(length(unique(circles))) names(grid.cols) <- levels(circles) chordDiagram(circles, self.link = 1, grid.col = grid.cols) I got the following errors: Error: Since you set ``grid.col``, the length should be either 1 or number of sectors, or set your ``grid.col`` as vector with names.
ncborcherding commented 3 years ago

Hey gt7901b,

I got the following errors: Error: Since you set grid.col, the length should be either 1 or number of sectors, or set your grid.col as vector with names.

So the hue_pal() call in the vignette is to match the default colors with the clusters. In this instance, you do not need to define grid.col, you can just call, chordDiagram(circles, self.link = 1) But if you want the default color palette from ggplots, you can define your grid colors with:

grid.cols <- hue_pal()(4) #number of tissues
names(grid.cols) <- unique(circles[,1]) #assign the color to the specific tissue

Hope that helps, Nick

gt7901b commented 3 years ago

great, thanks, Nick. It works. But this is showing the total numbers of shared clones among different tissues, right? Not the top 10 clones flows in different tissues/samples as in compareClonotypes. Do you think it is possible to make the x axis in alluvial plot a circle and connect the first and last bar ?

ncborcherding commented 3 years ago

So the Cord Diagram is showing the number of unique clonotypes shared/unique by you group.by variable. So not total number of shared clones or the top 10 clones.

To "circlize" an alluvial diagram using the ggalluvial package is not possible. You could use the circlize R package for individual clones like the alluvial plot, but that would take a lot of coding/refining.

gt7901b commented 3 years ago

thanks for your advice

ncborcherding commented 3 years ago

Just one last follow up on this thread for anyone interested, compareClonotype() in v1.0.0 has been fixed on github and bioconductor to ensure the stratum and alluvial portions match. From now on, the order and color for the compareClonotype() will be based alphabetically. Here is an example of the new results for the vignette data:

Rplot

gt7901b commented 3 years ago

Hi, Nick:

For the above compareClonotypes() function plot, numbers=10. It supposed to show the top10 clones, why only 9 listed?

thanks for your time

ncborcherding commented 3 years ago

This is a product of the dplyr fucntion top_n() if there is a tie in terms of total proportion of reads, it will drop the 10th value to prevent having extracting 11, 12, etc values. So it will not call the top 10 clonotypes if there are not 10 true top clonotypes.

gt7901b commented 3 years ago

thanks for the reply. So if there's very few clones shown then it means lots of them have the same proportion, only the first one of that proportion got selected? I could use unique but then I need to sort unique values in descending order to get additional ones.

ncborcherding commented 3 years ago

If you want to visualize more clonotypes, just set your numbers = 11 or 12, etc you might not be able to get 10 clonotypes exactly, but it will output the top 10 proportions.

gt7901b commented 3 years ago

Hi, Nick:

A related question on how the proportion in calculated on the compareClonotype() plot. in the code, tbl[,2]/sum(tbl[,2]), is it calculated by each sample, i.e., each column in the alluvial plot? or by all samples?

for (i in seq_along(df)) { tbl <- as.data.frame(table(df[[i]][,cloneCall])) tbl[,2] <- tbl[,2]/sum(tbl[,2]) colnames(tbl) <- c("Clonotypes", "Proportion") tbl$Sample <- names(df[i]) Con.df <- rbind.data.frame(Con.df, tbl) }

ncborcherding commented 3 years ago

Proportion is going to be scaled by how the combined object is set up - if it is the product of combineTCR(), there will be a difference compared to expression2List(). But yes, each column in the alluvial graph is scaled by the total number of cells with recovered clonotypes.

So I think the code is a good example:

for (i in seq_along(df)) {

#this creates a table of clonotypes and number of copies of that clonotype
tbl <- as.data.frame(table(df[[i]][,cloneCall])) 

#we then convert the number of copy numbers to relative proportion based on the total number of cells with clonotypes or "sum(tbl[,2])"
tbl[,2] <- tbl[,2]/sum(tbl[,2]) 

colnames(tbl) <- c("Clonotypes", "Proportion")
tbl$Sample <- names(df[i])
Con.df <- rbind.data.frame(Con.df, tbl)
}

Hope that helps, let me know if you have any other questions, Nick

gt7901b commented 3 years ago

@ncborcherding , hi, Nick: if I have multiple tissues (e.g., tumor, normal, PBMC from patient1) to do Alluvial plot in compareClonotypes(), is there a way to select just the top10 clonotypes in tumor tissue and see how those clones connect in other tissues? thanks for your time

ncborcherding commented 3 years ago

Hey gt7901b,

It depends on how you would like to set up the visulizations, but should be do-able. I would suggest if you want to control the specific top10 clonotypes by tissue, identify the top 10 clonotypes in your tumor samples and then use the clonotypes argument in the compareClonotypes() function. If you want to compare across tissues from a specific sample, just indicate the which sequecning run with the samples argument. The sample in the vignette is actually showing peripheral blood vs tumor sample in a single patient. You could do the sample thing with blod vs normal vs tumor.

Let me know if you run into any issues or have further questions.

Nick

gt7901b commented 3 years ago

thanks, the clonotypes option works great.