jokergoo / circlize

Circular visualization in R
http://jokergoo.github.io/circlize_book/book/
Other
959 stars 142 forks source link

Zooming in on chromosomes for species without a Cytoband #228

Closed moldach closed 3 years ago

moldach commented 3 years ago

In Chapter 9.4 a method for zooming in on Chromosomes is shown.

However, this method relies on cytoband information; which is only available for hg19 and mm10 with circlize

The function read.cytoband returns an object that looks like:

> cytoband
$df
      V1        V2        V3     V4      V5
1   chr1         0   2300000 p36.33    gneg
2   chr1   2300000   5400000 p36.32  gpos25
3   chr1   5400000   7200000 p36.31    gneg
4   chr1   7200000   9200000 p36.23  gpos25
5   chr1   9200000  12700000 p36.22    gneg
6   chr1  12700000  16200000 p36.21  gpos50
7   chr1  16200000  20400000 p36.13    gneg
8   chr1  20400000  23900000 p36.12  gpos25
9   chr1  23900000  28000000 p36.11    gneg
10  chr1  28000000  30200000  p35.3  gpos25
11  chr1  30200000  32400000  p35.2    gneg
12  chr1  32400000  34600000  p35.1  gpos25
13  chr1  34600000  40100000  p34.3    gneg
14  chr1  40100000  44100000  p34.2  gpos25
15  chr1  44100000  46800000  p34.1    gneg
16  chr1  46800000  50700000    p33  gpos75
17  chr1  50700000  56100000  p32.3    gneg
18  chr1  56100000  59000000  p32.2  gpos50
19  chr1  59000000  61300000  p32.1    gneg
20  chr1  61300000  68900000  p31.3  gpos50
21  chr1  68900000  69700000  p31.2    gneg
22  chr1  69700000  84900000  p31.1 gpos100
23  chr1  84900000  88400000  p22.3    gneg
24  chr1  88400000  92000000  p22.2  gpos75
25  chr1  92000000  94700000  p22.1    gneg
26  chr1  94700000  99700000  p21.3  gpos75
27  chr1  99700000 102200000  p21.2    gneg
28  chr1 102200000 107200000  p21.1 gpos100
29  chr1 107200000 111800000  p13.3    gneg
30  chr1 111800000 116100000  p13.2  gpos50
31  chr1 116100000 117800000  p13.1    gneg
32  chr1 117800000 120600000    p12  gpos50
33  chr1 120600000 121500000  p11.2    gneg
34  chr1 121500000 125000000  p11.1    acen
35  chr1 125000000 128900000    q11    acen
36  chr1 128900000 142600000    q12    gvar
37  chr1 142600000 147000000  q21.1    gneg
38  chr1 147000000 150300000  q21.2  gpos50
39  chr1 150300000 155000000  q21.3    gneg
40  chr1 155000000 156500000    q22  gpos50
41  chr1 156500000 159100000  q23.1    gneg
42  chr1 159100000 160500000  q23.2  gpos50
43  chr1 160500000 165500000  q23.3    gneg
44  chr1 165500000 167200000  q24.1  gpos50
45  chr1 167200000 170900000  q24.2    gneg
46  chr1 170900000 172900000  q24.3  gpos75
47  chr1 172900000 176000000  q25.1    gneg
48  chr1 176000000 180300000  q25.2  gpos50
49  chr1 180300000 185800000  q25.3    gneg
50  chr1 185800000 190800000  q31.1 gpos100
51  chr1 190800000 193800000  q31.2    gneg
52  chr1 193800000 198700000  q31.3 gpos100
53  chr1 198700000 207200000  q32.1    gneg
54  chr1 207200000 211500000  q32.2  gpos25
55  chr1 211500000 214500000  q32.3    gneg
56  chr1 214500000 224100000    q41 gpos100
57  chr1 224100000 224600000 q42.11    gneg
58  chr1 224600000 227000000 q42.12  gpos25
59  chr1 227000000 230700000 q42.13    gneg
60  chr1 230700000 234700000  q42.2  gpos50
61  chr1 234700000 236600000  q42.3    gneg
62  chr1 236600000 243700000    q43  gpos75
63  chr1 243700000 249250621    q44    gneg
371 chr2         0   4400000  p25.3    gneg
372 chr2   4400000   7100000  p25.2  gpos50
373 chr2   7100000  12200000  p25.1    gneg
374 chr2  12200000  16700000  p24.3  gpos75
375 chr2  16700000  19200000  p24.2    gneg
376 chr2  19200000  24000000  p24.1  gpos75
377 chr2  24000000  27900000  p23.3    gneg
378 chr2  27900000  30000000  p23.2  gpos25
379 chr2  30000000  32100000  p23.1    gneg
380 chr2  32100000  36600000  p22.3  gpos75
381 chr2  36600000  38600000  p22.2    gneg
382 chr2  38600000  41800000  p22.1  gpos50
383 chr2  41800000  47800000    p21    gneg
384 chr2  47800000  52900000  p16.3 gpos100
385 chr2  52900000  55000000  p16.2    gneg
386 chr2  55000000  61300000  p16.1 gpos100
387 chr2  61300000  64100000    p15    gneg
388 chr2  64100000  68600000    p14  gpos50
389 chr2  68600000  71500000  p13.3    gneg
390 chr2  71500000  73500000  p13.2  gpos50
391 chr2  73500000  75000000  p13.1    gneg
392 chr2  75000000  83300000    p12 gpos100
393 chr2  83300000  90500000  p11.2    gneg
394 chr2  90500000  93300000  p11.1    acen
395 chr2  93300000  96800000  q11.1    acen
396 chr2  96800000 102700000  q11.2    gneg
397 chr2 102700000 106000000  q12.1  gpos50
398 chr2 106000000 107500000  q12.2    gneg
399 chr2 107500000 110200000  q12.3  gpos25
400 chr2 110200000 114400000    q13    gneg
401 chr2 114400000 118800000  q14.1  gpos50
402 chr2 118800000 122400000  q14.2    gneg
403 chr2 122400000 129900000  q14.3  gpos50
404 chr2 129900000 132500000  q21.1    gneg
405 chr2 132500000 135100000  q21.2  gpos25
406 chr2 135100000 136800000  q21.3    gneg
407 chr2 136800000 142200000  q22.1 gpos100
408 chr2 142200000 144100000  q22.2    gneg
409 chr2 144100000 148700000  q22.3 gpos100
410 chr2 148700000 149900000  q23.1    gneg
411 chr2 149900000 150500000  q23.2  gpos25
412 chr2 150500000 154900000  q23.3    gneg
413 chr2 154900000 159800000  q24.1  gpos75
414 chr2 159800000 163700000  q24.2    gneg
415 chr2 163700000 169700000  q24.3  gpos75
416 chr2 169700000 178000000  q31.1    gneg
417 chr2 178000000 180600000  q31.2  gpos50
418 chr2 180600000 183000000  q31.3    gneg
419 chr2 183000000 189400000  q32.1  gpos75
420 chr2 189400000 191900000  q32.2    gneg
421 chr2 191900000 197400000  q32.3  gpos75
422 chr2 197400000 203300000  q33.1    gneg
423 chr2 203300000 204900000  q33.2  gpos50
424 chr2 204900000 209000000  q33.3    gneg
425 chr2 209000000 215300000    q34 gpos100
426 chr2 215300000 221500000    q35    gneg
427 chr2 221500000 225200000  q36.1  gpos75
428 chr2 225200000 226100000  q36.2    gneg
429 chr2 226100000 231000000  q36.3 gpos100
430 chr2 231000000 235600000  q37.1    gneg
431 chr2 235600000 237300000  q37.2  gpos50
432 chr2 237300000 243199373  q37.3    gneg
483 chr3         0   2800000  p26.3  gpos50
484 chr3   2800000   4000000  p26.2    gneg
485 chr3   4000000   8700000  p26.1  gpos50
486 chr3   8700000  11800000  p25.3    gneg
487 chr3  11800000  13300000  p25.2  gpos25
488 chr3  13300000  16400000  p25.1    gneg
489 chr3  16400000  23900000  p24.3 gpos100
490 chr3  23900000  26400000  p24.2    gneg
491 chr3  26400000  30900000  p24.1  gpos75
492 chr3  30900000  32100000    p23    gneg
493 chr3  32100000  36500000  p22.3  gpos50
494 chr3  36500000  39400000  p22.2    gneg
495 chr3  39400000  43700000  p22.1  gpos75
496 chr3  43700000  44100000 p21.33    gneg
497 chr3  44100000  44200000 p21.32  gpos50
498 chr3  44200000  50600000 p21.31    gneg
499 chr3  50600000  52300000  p21.2  gpos25
500 chr3  52300000  54400000  p21.1    gneg
501 chr3  54400000  58600000  p14.3  gpos50
502 chr3  58600000  63700000  p14.2    gneg
503 chr3  63700000  69800000  p14.1  gpos50
504 chr3  69800000  74200000    p13    gneg
505 chr3  74200000  79800000  p12.3  gpos75
506 chr3  79800000  83500000  p12.2    gneg
507 chr3  83500000  87200000  p12.1  gpos75
508 chr3  87200000  87900000  p11.2    gneg
509 chr3  87900000  91000000  p11.1    acen
510 chr3  91000000  93900000  q11.1    acen
511 chr3  93900000  98300000  q11.2    gvar
512 chr3  98300000 100000000  q12.1    gneg
513 chr3 100000000 100900000  q12.2  gpos25
514 chr3 100900000 102800000  q12.3    gneg
515 chr3 102800000 106200000 q13.11  gpos75
516 chr3 106200000 107900000 q13.12    gneg
517 chr3 107900000 111300000 q13.13  gpos50
518 chr3 111300000 113500000  q13.2    gneg
519 chr3 113500000 117300000 q13.31  gpos75
520 chr3 117300000 119000000 q13.32    gneg
521 chr3 119000000 121900000 q13.33  gpos75
522 chr3 121900000 123800000  q21.1    gneg
523 chr3 123800000 125800000  q21.2  gpos25
524 chr3 125800000 129200000  q21.3    gneg
525 chr3 129200000 133700000  q22.1  gpos25
526 chr3 133700000 135700000  q22.2    gneg
527 chr3 135700000 138700000  q22.3  gpos25
528 chr3 138700000 142800000    q23    gneg
529 chr3 142800000 148900000    q24 gpos100
530 chr3 148900000 152100000  q25.1    gneg
531 chr3 152100000 155000000  q25.2  gpos50
532 chr3 155000000 157000000 q25.31    gneg
533 chr3 157000000 159000000 q25.32  gpos50
534 chr3 159000000 160700000 q25.33    gneg
535 chr3 160700000 167600000  q26.1 gpos100
536 chr3 167600000 170900000  q26.2    gneg
537 chr3 170900000 175700000 q26.31  gpos75
538 chr3 175700000 179000000 q26.32    gneg
539 chr3 179000000 182700000 q26.33  gpos75
540 chr3 182700000 184500000  q27.1    gneg
541 chr3 184500000 186000000  q27.2  gpos25
542 chr3 186000000 187900000  q27.3    gneg
543 chr3 187900000 192300000    q28  gpos75
544 chr3 192300000 198022430    q29    gneg
545 chr4         0   4500000  p16.3    gneg
546 chr4   4500000   6000000  p16.2  gpos25
547 chr4   6000000  11300000  p16.1    gneg
548 chr4  11300000  15200000 p15.33  gpos50
549 chr4  15200000  17800000 p15.32    gneg
550 chr4  17800000  21300000 p15.31  gpos75
551 chr4  21300000  27700000  p15.2    gneg
552 chr4  27700000  35800000  p15.1 gpos100
553 chr4  35800000  41200000    p14    gneg
554 chr4  41200000  44600000    p13  gpos50
555 chr4  44600000  48200000    p12    gneg
556 chr4  48200000  50400000    p11    acen
557 chr4  50400000  52700000    q11    acen
 [ reached 'max' / getOption("max.print") -- omitted 662 rows ]

$chromosome
 [1] "chr1"  "chr2"  "chr3"  "chr4"  "chr5"  "chr6"  "chr7"  "chr8"  "chr9" 
[10] "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" "chr18"
[19] "chr19" "chr20" "chr21" "chr22" "chrX"  "chrY" 

$chr.len
     chr1      chr2      chr3      chr4      chr5      chr6      chr7      chr8 
249250621 243199373 198022430 191154276 180915260 171115067 159138663 146364022 
     chr9     chr10     chr11     chr12     chr13     chr14     chr15     chr16 
141213431 135534747 135006516 133851895 115169878 107349540 102531392  90354753 
    chr17     chr18     chr19     chr20     chr21     chr22      chrX      chrY 
 81195210  78077248  59128983  63025520  48129895  51304566 155270560  59373566 

How is the $chr.len column generated such that it has two values?

chr1
249250621

I'd like to make my own dummy cytoband so be able to zoom in

moldach commented 3 years ago

Okay I've figure out that these are just "named vectors" so I was able to create my own.

However, I'm having trouble understanding how to use the circos.link() function properly.

cytoband = list(
  df = data.frame(
    V1 = c("I", "II", "III", "IV", "V", "X"),
    V2 = c(0, 0, 0, 0, 0, 0),
    V3 = c(15072434, 15279421, 13783801, 17493829, 20924180, 17718942),
    V4 = c("I",  "II", "III", "IV", "V", "X"),
    V5 = c("I",  "II", "III", "IV", "V", "X")
  ),
  chromosome = c("I", "II", "III", "IV", "V", "X"),
  chr.len=c(I=15072434, II=15279421, III=13783801, IV=17493829, V=20924180, X=17718942)
)
cytoband_df = cytoband$df
chromosome = cytoband$chromosome

xrange = c(cytoband$chr.len, cytoband$chr.len[c("I", "IV")])
normal_chr_index = 1:6
zoomed_chr_index = c(1,4)

# normalize in normal chromsomes and zoomed chromosomes separately
sector.width = c(xrange[normal_chr_index] / sum(xrange[normal_chr_index]), 
                 xrange[zoomed_chr_index] / sum(xrange[zoomed_chr_index])) 

circos.par(start.degree = 90)
circos.initializeWithIdeogram(extend_chromosomes(cytoband_df, c("I", "IV")), sector.width = sector.width)
circos.track(ylim = c(0, 2), 
    bg.col = c("#ACCCDD", # pale yellow
               "#5896AD", # olive yellow 
               "#005C91", 
               "#003C69", 
               "#00254A", 
               "#96AAB5",
               "#ACCCDD",
               "#003C69"), 
    bg.border = NA, track.height = 0.05)

In your example you have used the following:

circos.link("chr1", get.cell.meta.data("cell.xlim", sector.index = "chr1"),
    "zoom_chr1", get.cell.meta.data("cell.xlim", sector.index = "zoom_chr1"),
    col = "#00000020", border = NA)
circos.clear()

Where the get.cell.meta.data() function returns 0.249250621 so I tried the following:

circos.link("Zoom_I", 0.249250621, 
            "Zoom_IV", 0.249250621, 
            col = "#00000020", border = NA)

circos.clear()

But I got the error:

Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), : 'data' must be of a vector type, was 'NULL'

Ultimately, however, I'd like to be able to use information from the circos.genomicLink function. As I had been using this successfully before however the region we are interested in was showing up as too-fine-of-a-line before.

suppressPackageStartupMessages(require(StructuralVariantAnnotation))
suppressPackageStartupMessages(require(VariantAnnotation))
suppressPackageStartupMessages(require(circlize))

vcf <- VariantAnnotation::readVcf("data/BC986_subset.vcf", "ce11")
gr <- breakpointRanges(vcf)
gr <- partner(gr)
seqlevelsStyle(gr) <- "UCSC"
pairs <- breakpointgr2pairs(gr)

df = data.frame(
    name  = c("I",  "II", "III", "IV", "V", "X"),
    start = c(0, 0, 0, 0, 0, 0),
    end   = c(15072434, 15279421, 13783801, 17493829, 20924180, 17718942))
circos.initializeWithIdeogram(species = "ce11")
circos.track(ylim = c(0, 1), 
    bg.col = c("#ACCCDD", # pale yellow
               "#5896AD", # olive yellow 
               "#005C91", 
               "#003C69", 
               "#00254A", 
               "#96AAB5"), 
    bg.border = NA, track.height = 0.05)

circos.genomicLink(as.data.frame(S4Vectors::first(pairs)), as.data.frame(S4Vectors::second(pairs)), col = "#5F4B8BFF")

How can I use circos.genomicLink() with this?

moldach commented 3 years ago

I would like to have it shown like this if possible.

The zoomed out arc is seen as a fine-line and the zoomed in portion would look larger (like a ribbon)

what-i-want

Your help is greatly appreciated

jokergoo commented 3 years ago

It supports more genomes as long as it is avaiable on UCSC database, e.g. rn6 (rat genome?, I cannot remember correctly). To let circos.initializeWithIdeogram() to be more flexible, if the genome has no cytoband data, circos.initializeWithIdeogram() actually still works but without drawing the ideogram track.

More generally, if the genome does not exist on UCSC database, you can still provide a data frame which are the ranges of chromosomes of that genome and send it to circos.genomeInitialize().

jokergoo commented 3 years ago

Check the following code:

df = data.frame(
    chr = c("I", "II", "III", "IV", "V", "X", "zoom_I", "zoom_IV"),
    start = c(0, 0, 0, 0, 0, 0, 0, 0),
    end = c(15072434, 15279421, 13783801, 17493829, 20924180, 17718942, 15072434, 17493829)
)

# normalize in normal chromsomes and zoomed chromosomes separately
sector.width = c(df$end[1:6] / sum(df$end[1:6]) * 0.2, 
                 df$end[7:8] / sum(df$end[7:8]))

circos.par(start.degree = 90, gap.after = c(1, 1, 1, 1, 1, 5, 1, 5))
circos.genomicInitialize(df, sector.width = sector.width)
circos.track(ylim = c(0, 2), 
    bg.col = c("#ACCCDD", # pale yellow
               "#5896AD", # olive yellow 
               "#005C91", 
               "#003C69", 
               "#00254A", 
               "#96AAB5",
               "#ACCCDD",
               "#003C69"), 
    bg.border = NA, track.height = 0.05)

circos.link("I", c(5e6, 5e6 + 2e5), "IV", c(5e6, 5e6 + 2e5), col = "#5F4B8BFF")
circos.link("zoom_I", c(5e6, 5e6 + 2e5), "zoom_IV", c(5e6, 5e6 + 2e5), col = "#5F4B8BFF")

circos.clear()

image

jokergoo commented 3 years ago

And by the way, your genome is c.elegans, right? The genome code for c.elegans is ce11 and you can directly use it in circos.initializeWithIdeogram():

circos.initializeWithIdeogram(species = "ce11")

image

moldach commented 3 years ago

Hi @jokergoo thanks for the prompt response! Your manual is absolutely phenomenal and this is the first time I've had to ask a question.

Based on the code you've provided it's also clear I can then zoom in further to the chromosome if the event is smaller than the 200kb event you simulated above. Awesome!

Thanks again!