robinweide / GENOVA

GENome Organisation Visual Analytics
GNU General Public License v3.0
68 stars 15 forks source link

Memory footprint in Rstudio #343

Open LucasMcNU opened 7 months ago

LucasMcNU commented 7 months ago

When loading .hic files is finished, there is a ton of memory not cleaned up. My .hic file after loading is ~ 3-4 GB, but the memory footprint in my R session is upwards of 25-40 GB. It is causing my Rsession to crash. This is on an institutional HPC.

R version: R/4.2.3 (also seen on R/4.0.0)

using the most recent version of GENOVA (remotes::install_github("robinweide/GENOVA"))

RStudio 2023.06.0+421 "Mountain Hydrangea" Release (583b465ecc45e60ee9de085148cd2f9741cc5214, 2023-06-06) for CentOS 7 Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:122.0) Gecko/20100101 Firefox/122.0

teunbrand commented 7 months ago

This remains the case after you call gc()?

LucasMcNU commented 7 months ago

yes, after I call gc() there is a decrease in the memory footprint. I am throwing quite a bit of RAM on the HPC towards this issue. To open and dump the contacts from a 1.5 gb .hic file, around ~40-50 GB are used by R studio. This then relaxes to around 25 to 40 GB and stays static after calling gc().

LucasMcNU commented 6 months ago

Hi, I wanted to follow up on this - I'm trying to load micro-C data using GENOVA. The data is large (~25-35 GB for .hic file). I'm using an HPC with ~700 GB of available RAM. GENOVA seemingly cant handle something this large, or freezes up. Is there anything you can suggest that would allow loading these contacts short of more memory?

teunbrand commented 6 months ago

The dev version has load_contacts_subset() to only load ROIs. GENOVA preceded micro-C and wasn't really build with data of that resolution in mind.

LucasMcNU commented 6 months ago

Thank you! Where can i find usage guidelines for load_contacts_subset()?

LucasMcNU commented 6 months ago

It looks like support for juicer .hic files hasn't been added yet?