Open splaisan opened 4 years ago
I have create a function to do it based on what I found in circlize doc and the vis_circos buitl-in (not very documented!)
I am not able to submit this code as a pull request but if someone could do or add this in the tutorial, you are welcome to take my code (when it is correct of course)
data(immdata)
mydat <- immdata$data$`A2-i129`
circos.clear()
circos.par("canvas.xlim" = c(-1.25, 1.25), "canvas.ylim" = c(-1.25, 1.25))
vis_genus(mydat, title="A2-i129_sample")
vis_genus <- function(x, title="", ...) {
# require("stringr", "circlize")
V <- sort(unique(unlist(strsplit(paste(x$V.name, collapse=","), ","))))
J <- sort(unique(unlist(strsplit(paste(x$J.name, collapse=","), ","))))
# create and fill matrix of pairwise with sum(Clones)
mat <- matrix(nrow = length(J), ncol = length(V))
colnames(mat) <- V
rownames(mat) <- J
for (j in seq(1:length(J))) {
for (v in seq(1:length(V))) {
sub <- x[(str_detect(x$V.name, V[[v]], negate = FALSE) &
str_detect(x$J.name, J[[j]], negate = FALSE)),]
mat[j,v] <- sum(sub$Clones)
}
}
# convert to proportions / fraction of total
mat <- mat/sum(mat)
# reorder by V and J decreasing order (left to right)
vmax <- colSums(mat)
jmax <- rowSums(mat)
mat2 <- mat[order(jmax, decreasing = FALSE), order(vmax, decreasing = TRUE)]
# use circlize
chordDiagram(mat2, annotationTrack = "grid", ...)
# add legends
circos.track(track.index = 1, panel.fun = function(x, y) {
circos.text(CELL_META$xcenter,
CELL_META$ylim[1],
CELL_META$sector.index,
facing = "clockwise",
niceFacing = TRUE,
adj = c(-0.25, 0.5))
}, bg.border = NA)
# add title
title(title)
}
Hi @splaisan - Hope you're well!
Thanks for making this, its a fantastic way to represent TCR data. I'm currently working out also how to modify this for data with paired alpha-beta sequences. If I'm successful Ill make sure to share.
I'm a Bioinformatics/R newbie and I was hoping you might further explain your function to me. My interpretation is that to appropriately represent the data, at some point you would need to multiply each V-J pairing (or row of data) by the "Clones" column in the sample data. I note that there is a calculation involving sub$Clones, but being unfamiliar with R I'm not 100% sure if this is doing what I think.
The reason I ask is that when I apply this to my own data, it seems to over represent rare clones and under represent common clones and I haven't been able to work out why. Looking at the graph produced on the A2-i129_sample - You would expect a striking feature to be the TRBV4-1 TRBJ2-2 and TRBV4-1 TRBJ2-3 pairings, which have the most clones for this sample. Whilst you can see a clear link between these, the circos suggests a higher representation of the TRBV2 and TRBJ2-5 pairing. But, unless im mistaken, it appears only 103 clones have this pairing. Any light you could shed on this would be great.
Thanks!
J
🚀 Feature
a function producing matrices or circos plot for a given sample with the different V-J combinations as edges and number of clones/reads as weight as done in vdjtools. (+ option for filtering low-representation combinations in the process)
Or a function reporting a matrix of V x J with frequencies/counts which would be enough as the circos plotting is apparently already there.
Motivation
provide this missing plot to complete the toolbox when compared to vdjtools