Closed linfanxiao closed 6 months ago
Hi I think those are examples of translocation event which mostly happen in cancer settings. If you don't know then it might not be what you want. C.Origami currently don't have the capacity to predict interactions beyond 2Mb regions.
If you do want to predict results of translocation, you will extract sequence and ATAC/CTCF bigwig tracks and append the two junction points from different chromosomes. Then you can run prediction on the fused inputs.
Thanks, Jimin, I will try it as your suggestion!
Hi, Jimin,
I have a question regarding the comparison of real and predicted HiC data. To resize the image, I used the preserve_range=True parameter. Is this the correct approach? The reason I used it is that the values in the resized image were very small, around 2e-18, whereas my original data ranged from 1 to 50. After setting preserve_range=True, the resized image now has a similar range to the original data.
This is my code below
cooler dump cell0041GLUT1.mcool::/resolutions/10000 --fill-lower -t pixels -r chr2:500,000-2,597,152 -b --no-balance --join --one-based-starts| awk '{print $2-1, $5-1, $7}' > test/cell0041GLUT1.txt
Rscript -e 'library(reshape2); df <- data.table::fread("test/cell0041GLUT1.txt"); df1 <- dcast(df, V1 ~ V2, value.var = "V3", fill = 0);df1=df1[,-1] ;write.table(df1, "test/cell0041GLUT1_wide.txt", sep = "\t",col.names=F,row.names = F, quote = F)'
The next part is processed in Python,plot_utils.py is the developer's script
import numpy as np
import os
import pandas as pd
from skimage.transform import resize
from plot_utils import MatrixPlot
# Open the text file in read mode
file_path = 'cell0041GLUT1_wide.txt' # Replace with the actual file path
with open(file_path, 'r') as file:
# Read the contents of the file
lines = file.readlines()
# Remove any leading or trailing whitespace from each line
lines = [line.strip() for line in lines]
# Convert the lines to a nested list of integers
matrix = [[int(num) for num in line.split()] for line in lines]
# Convert the nested list to a numpy array
mat = np.array(matrix)
# Print the contents of the file
print(mat)
# Resize the matrix to the desired resolution
resized_mat = resize(mat, (256, 256), anti_aliasing=True,preserve_range=True)
print(resized_mat)
bin_size = 8192
# Generate interaction lines and write to file
with open('realcell0041GLUT1_chr2_500000-2597152.txt', 'w') as f:
for i in range(256):
for j in range(i, 256):
start_coord = (i) * bin_size + 500000
reads = resized_mat[i, j]
end_coord = (j + 1) * bin_size + 500000
interaction_line = f"{start_coord} {end_coord} {reads:.8f}"
f.write(interaction_line + '\n')
# Instantiate MatrixPlot object
image=mat = np.log(resized_mat + 1)
output_path = "./"
prefix = "matrix_plot"
celltype = "realDGLUT1"
chr_name = "chr2"
start_pos = 500000
matrix_plot = MatrixPlot(output_path, image, prefix, celltype, chr_name, start_pos)
# Plot and save the matrix
matrix_plot.plot()
Hi @linfanxiao yes you should use preserve_range because skimage is usually used for image processing which is uint8 (0-255). Here these are float tensors.
Also when you compare real and predicted Hi-C data keep in mind that the intensity of predicted matrix is always in the log space (range around 0-10). To get the read counts (range 0 - around 2000), you need to do the exp(x) - 1 transformation.
Thanks, Jimin!
Dear developers,
Thanks for your work to develop the state-of-art tool.I have tried the example and my own data for intra chromosomal contacts, looking OK. However, how to predict intra chromosomal like you paper Fig. 4. If there are some example or tutorial to make the inter chromosomal contacts?
Thanks ! https://media.springernature.com/lw685/springer-static/image/art%3A10.1038%2Fs41587-022-01612-8/MediaObjects/41587_2022_1612_Fig4_HTML.png?as=webp