robinweide / GENOVA

GENome Organisation Visual Analytics
GNU General Public License v3.0
69 stars 15 forks source link

Use comparment scores from other software to use `saddle` and `compartment_matrixplot`? #273

Closed YiweiNiu closed 2 years ago

YiweiNiu commented 3 years ago

Hi,

I have generated AB compartment scores by other software and I want to import them into GENOVA to use functions such as saddle and compartment_matrixplot.

I used the following code to replace the original data frame in the GENOVA CS_discovery.

# compartment scores from dcHiC
x = readRDS(here::here("output", "HiC.dcHiC.vis", "compart_scores.100k.rds"))
x = x %>%
  unite("id", c("chrom", "start", "end")) %>%
  dplyr::select(id, M0, M1, M2)

# CS_discovery obj
CS_out = readRDS(file.path(genova_res_dir, "CS_out.100k.rds"))
y = CS_out$compart_scores %>%
  unite("id", c("chrom", "start", "end")) %>%
  dplyr::select(id, bin)

# join
z = y %>%
  left_join(x, by = "id") %>%
  separate(id, c("chrom", "start", "end"), "_")

# replace
CS_out$compart_scores = z

The function saddle() ran successfully with no errors, while the function compartment_matrixplot() got weird result.

compartment_matrixplot(
    exp1 = hic_100k[[1]],
    #exp2 = hic_100k[[2]],
    CS_discovery = CS_out,
    chrom = "chr1", arm = "q", # actually whole chrom
    colour_lim = c(0, 100)
  )

图片

I am worried about the compatibility problems:

  1. Could I use the compartment scores from other software in GENOVA?

  2. If the answer is YES, then how can I import the compartment scores into GENOVA?

Thanks in advance!

Bests, Yiwei

teunbrand commented 3 years ago

Hi there,

It's hard to tell what is happening without having a glance at the data. However, it should be possible to replace the data in a CS_discovery object by an arbitrary bedgraph-like data.frame, as long as the names match.

library(GENOVA)

exp <- get_test_data("150k")
cs <- compartment_score(exp)

# Computed compartment score
compartment_matrixplot(exp, chrom = "chr22", arm = "q", CS_discovery = cs)
#> Limits set to [0,4543]


# Bedgraph like format
set.seed(0)
bdg <- data.frame(
  chrom = "chr22",
  start = seq(0, 51.3e6, by = 100e3), # resolution not the same as experiment
  end   = seq(0, 51.3e6, by = 100e3) + 100e3,
  value = cumsum(rnorm(length(seq(0, 51.3e6, by = 100e3))))
)

# Set value column to expname
colnames(bdg)[4] <- expnames(exp)

# Replace compartment score
cs$compart_scores <- bdg

compartment_matrixplot(exp, chrom = "chr22", arm = "q", CS_discovery = cs)
#> Limits set to [0,4543]

Created on 2021-09-23 by the reprex package (v2.0.1)

Best, Teun

YiweiNiu commented 2 years ago

Hi,

Sorry for the late reply.

I found there were some NA containing rows in the external compartment score, after I replaced the original scores in CS_discovery object. The following code shows how I replaced the original scores.

# the external compartment scores
x = readRDS(here::here("output/HiC.dcHiC.vis", "compart_scores.100k.rds"))
x = x %>%
  unite("id", c("chrom", "start", "end")) %>%
  dplyr::select(id, M0, M1, M2)
dim(x)
[1] 24044     4
# CS_discovery obj by GENOVA
CS_out = readRDS(file.path(genova_res_dir, "CS_out.100k.rds"))
y = CS_out$compart_scores %>%
  unite("id", c("chrom", "start", "end")) %>%
  dplyr::select(id, bin)

dim(y)
[1] 27269     2
# join
z = y %>%
  left_join(x, by = "id") %>%
  separate(id, c("chrom", "start", "end"), "_")

# replace
CS_out$compart_scores = z
# check NAs
head(z[rowSums(is.na(z)) > 0,])
     chrom     start       end  bin M0 M1 M2
1955  chr1 195400000 195471971 1955 NA NA NA
1956 chr10         0    100000 1956 NA NA NA
1957 chr10    100000    200000 1957 NA NA NA
1958 chr10    200000    300000 1958 NA NA NA
1959 chr10    300000    400000 1959 NA NA NA
1960 chr10    400000    500000 1960 NA NA NA

Could this be the reason for the error above?

Bests, Yiwei

teunbrand commented 2 years ago

Hi there,

It's hard to debug this from just glancing at the code without having the data to follow along myself. Could you turn this into a reproducible example using e.g. the test data and some mockup of the external scores?

YiweiNiu commented 2 years ago

Thank you for your reply.

Ah, I found the error 😂

The code I used above would transform the start and end columns in compart_scores to character, not int.

This would induce error in Line 119 of matrixplot_compartment.R.

comp_dat <- comp_dat[chrom == chr$V1 & start >= chr$V2 & end <= chr$V3]

Perhaps it would be robust if adding some code to check the column types of compart_scores.


I found another potential bug when tracing this error, line 161 would return NA NA if the scores contain NA values.

I changed this line to the following, and it works well now.

complim <- range(na.omit(do.call(c, complim)))