js2264 / HiCExperiment

Importing and manipulating Hi-C data in R
http://js2264.github.io/HiCExperiment/
Other
9 stars 1 forks source link

Loading Speed #8

Closed LucasMcNU closed 5 months ago

LucasMcNU commented 6 months ago

Hi, I'm trying to load some merged .hic files that I generated using Juicer's pipeline. Loading a .hic experiment of around ~2 gb in size and ~5000 resolution on an HPC with 150 GB Ram seems to take well over an hour. Is there any way of speeding the loading step up?

Example code below:


library(HiContactsData)
library(HiCExperiment)

dir.2 <- "/projects/p32171/juicer/work/112123_HiC/juicer_analysis/juicer_analysis/"
hic ="inter_30.hic"

exps <- c("cond1",  "cond2", "cond3", "cond4")

wt <- HicFile(paste0(dir.2,exps[4],"/mega/aligned/",hic))

wt.hic <- import(wt, resolution =5000)```
js2264 commented 5 months ago

Hi @LucasMcNU, improting from hic files should work ok, but I haven't checked it on files that large. For cool files, a 2Gb and 5000 resolution shouldn't take more than 10' tops to be imported. To make the loading go faster, you can specify which region of the hic map you want to import with the focus argument (e.g. import(wt, resolution = 5000, focus = "chr1:1-100000").

LucasMcNU commented 5 months ago

by contrast, when using GENOVA to import a HiC file of similar size, the clock time is around 10'. These files are on the medium side for HiC data. The micro-C data generated in the last few years is quite a bit larger. Maybe parallelizing the import function with Biocparallel would help?

js2264 commented 5 months ago

Thanks for the suggestion, I could look into this. Just to make sure I test comparable files, your 2Gb HiC file at 5kb resolution is from a mammalian genome (~ few Gb genome)? Thanks!

LucasMcNU commented 5 months ago

hg38! Human!

js2264 commented 5 months ago

thx, I'll look into it!