robinweide / GENOVA

GENome Organisation Visual Analytics
GNU General Public License v3.0
68 stars 15 forks source link

Predicting A/B from GC content? #274

Open TerezaClarence opened 2 years ago

TerezaClarence commented 2 years ago

Dear GENOVA developers,

I was wondering if it is possible to allow A/B compartment prediction and saddle plot analysis by using the GC content (which could be in a *bw format)? If that is true, can I kindly ask for suggestion how to do that or what other format of GC content should be loaded (I assume *bed format should be ok)?

Thank you very much in advance for your help.

Best regards, Tereza

TerezaClarence commented 2 years ago

Dear GENOVA developers,

I was wondering whether you plan to add a possibility of GC-content for creating saddle_plots (re my previous post)? When comparing for instance H27K3ac and GC content for generating saddle_plots in cooltools, we do observe a difference and we may prefer to use GC-content.

Thank you very much in advance for answer.

Best wishes, Tereza

teunbrand commented 2 years ago

Hello there,

Yes it is possible to use GC content for signing the compartment scores, it is alluded to in the documentation of ?sign_compartmentscore. The format of the GC bias should be in that of a bedgraph. An example of how one could use the GC content is below.

library(GENOVA)
library(BSgenome.Hsapiens.UCSC.hg19)
library(Biostrings)

# Get test data
exp <- get_test_data("150k")

# Get relevant chromosome lengths
chroms <- seqinfo(BSgenome.Hsapiens.UCSC.hg19)[exp$CHRS]

# Cut chromosomes into bins
bins <- tileGenome(seqlengths(chroms), tilewidth = 10e3, cut.last.tile.in.chrom = TRUE)

# Retrieve sequence and calculate GC bias
seq <- getSeq(BSgenome.Hsapiens.UCSC.hg19, bins)
seq <- alphabetFrequency(seq, as.prob = TRUE)
gc_bias <- seq[, "G"] + seq[, "C"] 

# Reformat as bedgraph
bedgraph <- data.frame(
  chrom = decode(seqnames(bins)),
  start = start(bins),
  end   = end(bins),
  value = gc_bias
)

# Compute compartment score
cs <- compartment_score(exp, bedgraph = bedgraph)
visualise(cs, chr = 'chr21')
#> Warning: Removed 62 row(s) containing missing values (geom_path).

Created on 2021-10-19 by the reprex package (v2.0.0)

hiroyukikato911 commented 2 years ago

Hi,

Thanks for this wonderful tool!

I'm also looking for GC content sorting for AB compartments. I realize that the aforementioned command cannot apply to .hic files that have chromosome name "1", "2" and so on. (without chr) Is there a way to overcome this?

Best regards, Hiroyuki

teunbrand commented 2 years ago

Hi there,

If your HiC data chromosomes are denoted with "1", "2" etc, doesn't it simply solve the problem by gsub()ing out the "chr" part of the chromosome names?

Best, Teun