jokergoo / circlize

Circular visualization in R
http://jokergoo.github.io/circlize_book/book/
Other
963 stars 145 forks source link

What is the right way of adding labels to the genomic links? #234

Closed peranti closed 3 years ago

peranti commented 3 years ago

Here is the example of the dataset used:

dataLinks 

#   LinkStartLabel LinkStartChrom LinkStartFrom LinkStartEnd LinkEndLabel LinkEndChrom LinkEndFrom LinkEndEnd
# 1 DIMT1L         chr5                61684351     61699728 FBL          chr19           40325093   40337054
# 2 FBL            chr19               40325093     40337054 PRPF19       chr11           60658019   60674061
# 3 ARSA           chr22               51063449     51066607 NEU2         chr2           233897382  233899767
# 4 PRPF19         chr11               60658019     60674061 SNRPA        chr19           41256779   41271294
# 5 ASF1A          chr6               119215241    119230336 HIST1H1A     chr6            26017260   26018040

The code used to create the plot:

library(circlize)
circos.initializeWithIdeogram(plotType = NULL, chromosome.index=paste0("chr", c(1:22)))
circos.track(ylim = c(0, 1), panel.fun = function(x, y) {
  chr = CELL_META$sector.index
  xlim = CELL_META$xlim
  ylim = CELL_META$ylim
  circos.rect(xlim[1], 0, xlim[2], 1, col = rand_color(1))
  circos.text(mean(xlim), mean(ylim), chr, cex = 0.7, col = "white",
              facing = "inside", niceFacing = TRUE)
}, track.height = 0.15, bg.border = NA)

circos.genomicLink(dataLinks[,c("LinkStartChrom", "LinkStartFrom")], 
                   dataLinks[,c("LinkEndChrom", "LinkEndFrom")], 
                   col = "red")

Rplot

However, when I try to add the labels to the links, it doesn't work. The code used is as below:

circos.genomicLabels(dataLinks[,c("LinkStartChrom", "LinkStartFrom", "LinkStartLabel")], 
                     labels = dataLinks$LinkStartLabel, col = "red")

Also, is there any possibility to choose the colours for different chromosomes instead of random? I prefer to use colour-blind friendly colours.

Thanks a lot for creating and supporting this amazing package.

Edit: I am trying to create a plot similar to this one: Rplot

jokergoo commented 3 years ago

The order should be:

  1. initialize the plot by circos.initializeWithIdeogram(..., plotType = NULL). This means to initialize the layout but without drawing anything.
  2. add the first track by circos.genomicLabels().
  3. add the second track by circos.track() and simply set background colors and add chromosome names.
  4. the links
jokergoo commented 3 years ago

So with your code, it is something like:

circos.initializeWithIdeogram(plotType = NULL, chromosome.index=paste0("chr", c(1:22)))
circos.genomicLabels(dataLinks[,c("LinkStartChrom", "LinkStartFrom", "LinkStartLabel")], 
                     labels = dataLinks$LinkStartLabel, col = "red")
circos.track(ylim = c(0, 1), panel.fun = function(x, y) {
  chr = CELL_META$sector.index
  xlim = CELL_META$xlim
  ylim = CELL_META$ylim
  circos.rect(xlim[1], 0, xlim[2], 1, col = rand_color(1))
  circos.text(mean(xlim), mean(ylim), chr, cex = 0.7, col = "white",
              facing = "inside", niceFacing = TRUE)
}, track.height = 0.15, bg.border = NA)
circos.genomicLink(dataLinks[,c("LinkStartChrom", "LinkStartFrom")], 
                   dataLinks[,c("LinkEndChrom", "LinkEndFrom")], 
                   col = "red")
peranti commented 3 years ago

Thanks for the reply.

I have tried the same too but there is no difference in the output whether I include the following line or not.

circos.genomicLabels(dataLinks[,c("LinkStartChrom", "LinkStartFrom", "LinkStartLabel")], 
                     labels = dataLinks$LinkStartLabel, col = "red")

What do you think might be the issue?

jokergoo commented 3 years ago

The first argument for circos.genomicLabels() should be a data frame with chromosome, start and end positions. According to your code dataLinks[,c("LinkStartChrom", "LinkStartFrom", "LinkStartLabel")], the third column is not end position column.

peranti commented 3 years ago

Thanks for the suggestion. As I want a line for the links, I am using the same column for both start and end positions LinkStartFrom.

I get an error when I use the following line of code:

circos.genomicLabels(dataLinks[,c("LinkStartChrom", "LinkStartFrom", "LinkStartFrom")], 
                     labels = dataLinks$LinkStartLabel)

Error: The length of col (1021) should be equal to 1 or the number of your regions (47). In addition: Warning messages: 1: In if (cr != chr) { : the condition has length > 1 and only the first element will be used 2: In if (side == "inside") { : the condition has length > 1 and only the first element will be used

Is there any wrong use of the parameters?

jokergoo commented 3 years ago

Then please send me the data.

peranti commented 3 years ago

Hi,

Please find the sample data attached here.

Kindly let me know if you need more information.

Best Regards, Pradeep

On Tue, Nov 17, 2020 at 9:25 PM Zuguang Gu notifications@github.com wrote:

Then please send me the data.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jokergoo/circlize/issues/234#issuecomment-729180807, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACIIVZIMGLDM6INFNLYEIJ3SQLL2ZANCNFSM4TYWJZBA .

jokergoo commented 3 years ago

Hi, I cannot see the file. Can you directly attach it in this github issue?

peranti commented 3 years ago

Here it is, sampleInteractions.txt.

sorry for the trouble. I was under impression that it is received by email. Thanks for the followup!

jokergoo commented 3 years ago

Yes, I can guess that :)

jokergoo commented 3 years ago

I tried with the newest version and it works fine:

circos.initializeWithIdeogram(plotType = NULL, chromosome.index=paste0("chr", c(1:22)))
circos.genomicLabels(dataLinks[,c("LinkStartChrom", "LinkStartFrom", "LinkStartEnd")], 
                     labels = dataLinks$LinkStartLabel, col = "red", side = "outside")
circos.track(ylim = c(0, 1), panel.fun = function(x, y) {
  chr = CELL_META$sector.index
  xlim = CELL_META$xlim
  ylim = CELL_META$ylim
  circos.rect(xlim[1], 0, xlim[2], 1, col = rand_color(1))
  circos.text(mean(xlim), mean(ylim), chr, cex = 0.7, col = "white",
              facing = "inside", niceFacing = TRUE)
}, track.height = 0.15, bg.border = NA)
circos.genomicLink(dataLinks[,c("LinkStartChrom", "LinkStartFrom")], 
                   dataLinks[,c("LinkEndChrom", "LinkEndFrom")], 
                   col = "red")

image

You can install it from GitHub.

peranti commented 3 years ago

Hello Zuguang, thanks for the fix/suggestions. I could replicate the above plot.

As there are duplicates in the labels (TERF2, VHL), I took only the unique values for labelling using the line below:

dataLinksUnique <- unique(dataLinks[,c("LinkStartChrom", "LinkStartFrom", "LinkStartEnd", "LinkStartLabel")])

It worked fine too!

However, when I replicated this test on the sample data to the complete files, I face the following issue with the labelling

circos.genomicLabels(dataLinksUnique[,c("LinkStartChrom", "LinkStartFrom", "LinkStartEnd")], 
                     labels = dataLinksUnique$LinkStartLabel, col = "red", side = "outside")

Error in names(sector.data) <- colnames(.SECTOR.DATA)[-1] : 'names' attribute [6] must be the same length as the vector [0]

I am currently looking into the root cause of this issue.

jokergoo commented 3 years ago

OK, then can you also send me the complete dataset?

peranti commented 3 years ago

I have shared the file with you using the contact email of the package.

May I know why it initially did not work with the live version and worked with the Github version?

ps: I will continue looking into the root cause of the second issue.

Edit: Looks like, the issue exists with the sample dataset too. What versions of the dependent packages should be used?

sessionInfo()

R version 4.0.3 (2020-10-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] igraph_1.2.6 circlize_0.4.12 forcats_0.5.0 stringr_1.4.0 dplyr_1.0.2 purrr_0.3.4
[7] readr_1.4.0 tidyr_1.1.2 tibble_3.0.4 ggplot2_3.3.2 tidyverse_1.3.0 here_0.1

loaded via a namespace (and not attached): [1] httr_1.4.2 pkgload_1.1.0 jsonlite_1.7.1 modelr_0.1.8 assertthat_0.2.1
[6] cellranger_1.1.0 yaml_2.2.1 remotes_2.2.0 sessioninfo_1.1.1 pillar_1.4.6
[11] backports_1.2.0 glue_1.4.2 digest_0.6.27 rvest_0.3.6 colorspace_2.0-0
[16] htmltools_0.5.0 pkgconfig_2.0.3 devtools_2.3.2 broom_0.7.2 haven_2.3.1
[21] scales_1.1.1 processx_3.4.4 generics_0.1.0 usethis_1.6.3 ellipsis_0.3.1
[26] withr_2.3.0 cli_2.1.0 magrittr_1.5 crayon_1.3.4 readxl_1.3.1
[31] memoise_1.1.0 evaluate_0.14 ps_1.4.0 fs_1.5.0 fansi_0.4.1
[36] xml2_1.3.2 pkgbuild_1.1.0 tools_4.0.3 prettyunits_1.1.1 hms_0.5.3
[41] GlobalOptions_0.1.2 lifecycle_0.2.0 munsell_0.5.0 reprex_0.3.0 callr_3.5.1
[46] compiler_4.0.3 rlang_0.4.8 grid_4.0.3 rstudioapi_0.11 rmarkdown_2.5
[51] testthat_3.0.0 gtable_0.3.0 DBI_1.1.0 curl_4.3 R6_2.5.0
[56] lubridate_1.7.9 knitr_1.30 utf8_1.1.4 rprojroot_1.3-2 shape_1.4.5
[61] desc_1.2.0 stringi_1.5.3 parallel_4.0.3 Rcpp_1.0.5 vctrs_0.3.4
[66] dbplyr_2.0.0 tidyselect_1.1.0 xfun_0.19

jokergoo commented 3 years ago

I run with your complete dataset and there is still no error (with the current GitHub version).

I also improved the code you draw the labels. Initially, you use all the rows which contain duplicated gene names, second, the number of unique genes is still too large for visualization. I basically only use the top 50 genes which have the most connections to other genes:

circos.initializeWithIdeogram(plotType = NULL, chromosome.index=paste0("chr", c(1:22)))

labels_df = unique(dataLinks[,c("LinkStartChrom", "LinkStartFrom", "LinkStartEnd", "LinkStartLabel")])
labels = names(sort(table(c(dataLinks$LinkStartLabel, dataLinks$LinkEndLabel)), decreasing = TRUE))[1:50]
labels_df = labels_df[labels_df$LinkStartLabel %in% labels, ]

circos.genomicLabels(labels_df, labels.column = 4, col = "red", side = "outside")
circos.track(ylim = c(0, 1), panel.fun = function(x, y) {
  chr = CELL_META$sector.index
  xlim = CELL_META$xlim
  ylim = CELL_META$ylim
  circos.rect(xlim[1], 0, xlim[2], 1, col = rand_color(1))
  circos.text(mean(xlim), mean(ylim), chr, cex = 0.7, col = "white",
              facing = "inside", niceFacing = TRUE)
}, track.height = 0.15, bg.border = NA)
circos.genomicLink(dataLinks[,c("LinkStartChrom", "LinkStartFrom")], 
                   dataLinks[,c("LinkEndChrom", "LinkEndFrom")], 
                   col = "red")

image

peranti commented 3 years ago

Thanks for looking into the issue and updating me.

I have tried to run the command in the same sequence as mentioned. When I executed the following command,

circos.genomicLabels(labels_df, labels.column = 4, col = "red", side = "outside")

I received this error

Error in chr_start[all_chr] : invalid subscript type 'list' In addition: Warning messages: 1: In .SECTOR.DATA[[1]] == sector.index : longer object length is not a multiple of shorter object length 2: In .SECTOR.DATA[[1]] == sector.index : longer object length is not a multiple of shorter object length

I am currently looking into the root cause of this issue. Meanwhile, is this message familiar to you?

sessionInfo()

R version 4.0.3 (2020-10-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] circlize_0.4.12 forcats_0.5.0 stringr_1.4.0 dplyr_1.0.2 purrr_0.3.4
[6] readr_1.4.0 tidyr_1.1.2 tibble_3.0.4 ggplot2_3.3.2 tidyverse_1.3.0 [11] here_0.1

loaded via a namespace (and not attached): [1] Rcpp_1.0.5 lubridate_1.7.9 prettyunits_1.1.1 ps_1.4.0
[5] assertthat_0.2.1 rprojroot_1.3-2 digest_0.6.27 R6_2.5.0
[9] cellranger_1.1.0 backports_1.2.0 reprex_0.3.0 evaluate_0.14
[13] httr_1.4.2 pillar_1.4.6 GlobalOptions_0.1.2 rlang_0.4.8
[17] curl_4.3 readxl_1.3.1 rstudioapi_0.11 callr_3.5.1
[21] rmarkdown_2.5 desc_1.2.0 devtools_2.3.2 igraph_1.2.6
[25] munsell_0.5.0 broom_0.7.2 compiler_4.0.3 modelr_0.1.8
[29] xfun_0.19 pkgconfig_2.0.3 pkgbuild_1.1.0 shape_1.4.5
[33] htmltools_0.5.0 tidyselect_1.1.0 fansi_0.4.1 crayon_1.3.4
[37] dbplyr_2.0.0 withr_2.3.0 grid_4.0.3 jsonlite_1.7.1
[41] gtable_0.3.0 lifecycle_0.2.0 DBI_1.1.0 magrittr_1.5
[45] scales_1.1.1 stringi_1.5.3 cli_2.1.0 fs_1.5.0
[49] remotes_2.2.0 testthat_3.0.0 xml2_1.3.2 ellipsis_0.3.1
[53] generics_0.1.0 vctrs_0.3.4 tools_4.0.3 glue_1.4.2
[57] hms_0.5.3 processx_3.4.4 pkgload_1.1.0 parallel_4.0.3
[61] yaml_2.2.1 colorspace_2.0-0 sessioninfo_1.1.1 rvest_0.3.6
[65] memoise_1.1.0 knitr_1.30 haven_2.3.1 usethis_1.6.3

jokergoo commented 3 years ago

You have installed the newest version of circlize, havn't you? Can you restart the R session and try again?

jokergoo commented 3 years ago

And print head(labels_df), or str(labels_df) for me.

peranti commented 3 years ago

I am using circlize version 0.4.12 and have restarted the R session and have encountered the same issue.

Here is the information you have asked

head(labels_df)

LinkStartChrom LinkStartFrom LinkStartEnd LinkStartLabel 1 chr19 40325093 40337054 FBL 2 chr11 60658019 60674061 PRPF19 3 chr3 10183319 10193762 VHL 4 chr13 37005967 37017019 CCNA1 5 chr6 18387594 18468850 RNF144B 6 chr6 170844204 170862417 PSMB1

str(labels_df)

Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 48 obs. of 4 variables: $ LinkStartChrom: chr "chr19" "chr11" "chr3" "chr13" ... $ LinkStartFrom : num 40325093 60658019 10183319 37005967 18387594 ... $ LinkStartEnd : num 40337054 60674061 10193762 37017019 18468850 ... $ LinkStartLabel: chr "FBL" "PRPF19" "VHL" "CCNA1" ...

jokergoo commented 3 years ago

try to convert labels_df or maybe also dataLinks to a data frame. I am not sure how well circlize supports tibble.

peranti commented 3 years ago

When I tried converting it to a dataframe, it worked!

class(dataLinks)

[1] "data.frame"

head(labels_df)

LinkStartChrom LinkStartFrom LinkStartEnd LinkStartLabel 2 chr19 40325093 40337054 FBL 4 chr11 60658019 60674061 PRPF19 11 chr3 10183319 10193762 VHL 14 chr13 37005967 37017019 CCNA1 30 chr6 18387594 18468850 RNF144B 33 chr6 170844204 170862417 PSMB1

str(labels_df)

'data.frame': 48 obs. of 4 variables: $ LinkStartChrom: chr "chr19" "chr11" "chr3" "chr13" ... $ LinkStartFrom : int 40325093 60658019 10183319 37005967 18387594 170844204 55912650 37416840 41488614 38137060 ... $ LinkStartEnd : int 40337054 60674061 10193762 37017019 18468850 170862417 55919325 37557876 41576081 38154212 ... $ LinkStartLabel: chr "FBL" "PRPF19" "VHL" "CCNA1" ...

The output is as:

Rplot01

jokergoo commented 3 years ago

Good, since I don't use tidyverse, maybe I missed to test these cases.

peranti commented 3 years ago

Thanks, I should have checked it for a dataframe too!

Actually, it is very difficult to make sense of this plot as it is cluttered with the links. I would explore the circlize functionalities more so as to identify the possibilities (maximizing the circle radius/size of the plot/ ...)

Also, is there any possibility to choose the colours for different chromosomes instead of random? I prefer to use colour-blind friendly colours. (I mentioned this in my initial query, if you wish to deal with this as a separate issue, I could close this one and raise another issue.)

What do you mean by "the number of unique genes is still too large for visualization."? Is it not possible to render an image or not visually identifiable?

jokergoo commented 3 years ago
  1. You can set transparency to the link colors

  2. when you set the colors:

circos.track(ylim = c(0, 1), panel.fun = function(x, y) {
...
  circos.rect(xlim[1], 0, xlim[2], 1, col = rand_color(1))
...
}, track.height = 0.15, bg.border = NA)

You can self-define the color for col argument. You can get the "current chromosome" by CELL_META$sector.index, then you can define a color vector which defines colors for the chromosomes, say chr_col, and assign the color to each chromosome by:

circos.rect(..., col = chr_col[CELL_META$sector.index])
  1. If you check the number of rows for dataLinks and unique genes:
> dataLinks = read.table("~/Downloads/Interactions.txt", header =TRUE)
> dim(dataLinks)
[1] 1021    8
> labels_df = unique(dataLinks[,c("LinkStartChrom", "LinkStartFrom", "LinkStartEnd", "LinkStartLabel")])
> dim(labels_df)
[1] 235   4

Maybe drawing 235 labels around the circle is too many? One way to arrange the labels on the circle without overlapping to each other is to set a smaller font size? If not all the genes are that important, I think it might be sufficient to just visualize the top genes.

peranti commented 3 years ago

sure, I will look into increasing the transparency of the colours, setting new colours and reducing the font size of the labels.

Additionally, would it be possible to increase the radius of the outer track?

For others' reference, the code mentioned here works fine: https://github.com/jokergoo/circlize/issues/234#issuecomment-730971667

Hence, I am closing this issue.

jokergoo commented 3 years ago

The default circle is with radius one and is drawn in [-1, 1] on both x and y axes, so I think the default should look fine. But you can check the following link if you really want to manually adjust the radius:

https://jokergoo.github.io/circlize_book/book/advanced-layout.html#combine-circular-plots

It is a matter of setting canvas.xlim and canvas.ylim in circos.par().

c11cc commented 3 years ago

@jokergoo Hi, thank you for providing such a convenient tool.

My problem is pretty similar to the above but a little bit different. As I have only few links to show, so I want to add labels to both terminal of the link, and assign the labels of the same link with the same color if possible. What I want is a bit like this : image

Here is the example data:

LinkStartChrom | LinkStartFrom | LinkStartEnd | LinkStartLabel | LinkEndChrom | LinkEndFrom | LinkEndEnd | LinkEndLabel chr5 | 61684351 | 61699728 | DIMT1L | chr19 | 40325093 | 40337054 | FBL chr22 | 51063449 | 51066607 | ARSA | chr2 | 23389738 | 33899767 | NEU2 chr11 | 60658019 | 60674061 | RPF19 | chr19 | 41256779 | 41271294 | SNRPA chr6 | 11921524 | 19230336 | ASF1A | chr6 | 26017260 | 26018040 | HIST1H1A

Using the following command, I can get one terminal labeled. But I don't know how to add label to the other side. col=rainbow(nrow(dataLinks)) circos.initializeWithIdeogram(species="hg38",plotType =NULL) circos.genomicLabels(dataLinks, labels.column= 4,side = "outside",col=col,line_col="white") circos.genomicIdeogram(species="hg38") circos.genomicLink(dataLinks[,1:4], dataLinks[,5:8], col = col, border =col,lwd=2)

image

jokergoo commented 3 years ago

@c11cc you don't actually need to use circos.genomicLabels(). Since you know the positions of the genes, you can directly add it by circos.text():

For example the following link, assuming it is in dataLinks[i, ]:

chr6 | 11921524 | 19230336 | ASF1A | chr6 | 26017260 | 26018040 | HIST1H1A

You can add gene names for the two ends by:

circos.text(x = (dataLinks[i, 2] + dataLinks[i, 3])/2, 
            y = 1,  # you can adjust this value to control the position of labels
            dataLinks[i, 4], 
            facing = "clockwise", adj = c(0, 0.5), niceFacing = TRUE,
            sector.index = dataLinks[i, 1],
            track.index = 1)
circos.text(x = (dataLinks[i, 6] + dataLinks[i, 7])/2, 
            y = 1, 
            dataLinks[i, 8],
            facing = "clockwise", adj = c(0, 0.5), niceFacing = TRUE,
            sector.index = dataLinks[i, 5],
            track.index = 1)

Similarly, if you want to add points, use circos.points() in the same away.

c11cc commented 3 years ago

@jokergoo I tried to draw using the following code, but the text always overlaps with the genome figure unless I set the y over 2 and it will report note "Note: 1 point is out of plotting region in sector 'chr5', track '1'." then. I tried to use a blank track to substitute step2 but failed. The worst thing is that the gene labels overlap if they are at closed positions on genome. I am wondering if they can be separated?

step1:

circos.initializeWithIdeogram(species="hg38",plotType =NULL)

step2: circos.genomicIdeogram(species="hg38")

step3: for (i in seq_len(nrow(dataLinks))) { circos.text(x = (dataLinks[i, 2] + dataLinks[i, 3])/2, y = 2.5, # you can adjust this value to control the position of labels dataLinks[i, 4], facing = "outside", adj = c(0, 0.5), niceFacing = TRUE, sector.index = dataLinks[i, 1], track.index = 1) circos.text(x = (dataLinks[i, 6] + dataLinks[i, 7])/2, y = 2.5, dataLinks[i, 8], facing = "outside", adj = c(0, 0.5), niceFacing = TRUE, sector.index = dataLinks[i, 5], track.index = 1) }

here is the result of above command: image

Thanks.

jokergoo commented 3 years ago

Don't care too much on the message of "Note: 1 point is out of plotting region in sector...".

When the labels overlap, you need to use circos.genomicLabels(), but that only works for labels with "clock wise" facing.

c11cc commented 3 years ago

I tried to use circos.genomicLabels() and add the color setting to the input file. As I have no more than 20 lines to show, adding colors won't be very tedious. Finally, it looks like this and I think it's ok : image