robinweide / GENOVA

GENome Organisation Visual Analytics
GNU General Public License v3.0
69 stars 15 forks source link

how to show/print the APA value #305

Open BenxiaHu opened 2 years ago

BenxiaHu commented 2 years ago

Hello, is it possible to print or show the APA score in the heatmap? Another question is what the u Contacts is? Is it the normalized contact value? Best,

image
teunbrand commented 2 years ago

Hey there,

Yeah you can put some number in the plot, but I'm afraid it is not very straightforward. In essence, visualse() returns a ggplot2 object that you can further decorate with layers, for example a text layer. The tricky bit is probably going to be getting the facetting variables to match up. Other than that, the quantify() function returns various metrics about the APA, so you can use one of these to annotate the plot (e.g. the fold change of the central pixels over background in the example below).

The 'mu Contacts' metric is the average contacts for that pixel/bin relative to the loop center across all loops. Whether it is the normalised average depends on whether you've loaded raw (no) or normalised (yes) data.

library(GENOVA)

exp <- get_test_data("40k") 
explist <- list(exp, exp) # Copy and rename 1 sample
expnames(explist[[1]]) <- "Sample X"

bedpe <- data.frame(
  chr1   = "chr21",
  start1 = 17440000,
  end1   = 17480000,
  chr2   = "chr21",
  start2 = 21200000,
  end2   = 21240000
)

apa <- APA(explist, bedpe = bedpe)

qapa <- quantify(apa, size = 3) # size is the number of pixels/bins considered to still be the loop

visualise(apa) +
  ggplot2::geom_text(
    data = data.frame(
      x = 3e5, y = 3e5,  # adapt to your resolution
      name = qapa$per_sample$sample,
      mode = "Individual", 
      label = qapa$per_sample$foldchange
    ),
    ggplot2::aes(x, y, label = scales::number(label, 0.01)),
    inherit.aes = FALSE
  )

Created on 2022-09-26 by the reprex package (v2.0.1)

BenxiaHu commented 2 years ago

thanks a lot. looks good. I am still confused about The 'mu Contacts' metric is the average contacts for that pixel/bin relative to the loop center across all loops.

th mu contact should be observed/expected. I don't understand the definition of mu contact. Would you like to explain it more? Best

teunbrand commented 2 years ago

The 'mu' is just shorthand notation for the mean. For every loop, you have a small piece of Hi-C map surrounding the loop, each of the same size, let's say an n x n matrix. For every position [i, j] in that matrix, you take the [i, j] bin/pixel in that matrix for all the loops, and calculate the average for that position. Now you do the same for all possible [i,j] positions and you get the APA result, which is a grand average across all loops. Of course what a 'contact' means, depends on you input data. If you throw in balanced data, is just means the average of normalised loops, but if you throw in obs/exp data, it is the average obs/exp.

BenxiaHu commented 2 years ago

I see. Is it possible to show normalized or Observed/Expected, not "mu" Contacts? As the "mu" Contacts looks weird or confused for others.

teunbrand commented 2 years ago

I don't think there is a way to show obs/exp directly, but you can load data as a z-score matrix that should get rid of the diagonal effect. As for 'normalised' contacts, how would you expect this to work when every pixel represents a part of multiple loops?