tariks / peakachu

Genome-wide contact analysis using sklearn
MIT License
57 stars 9 forks source link

MemoryError on CentOS 7.6 Cluster #11

Closed CriticalSci closed 4 years ago

CriticalSci commented 4 years ago

Hi there,

Trying to test out Peakachu on a HPC Cluster with following specs: Processors: 32-cores of Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz RAM: 384GB RAM OS: CentOS 7.6

However on running peakachu depth -p contactmatrix.mcool::/resolutions/1000 with ample memory (NCPUs Requested: 32, NCPUs Used: 32, Memory Requested: 264gb, Memory Used: 212190396kb) I am getting a MemoryError (see below). Seems it may be an issue with overcommit handling?

Has anyone ever encountered this problem?

Traceback (most recent call last):
  File "/usr/local/anaconda3-2020/envs/3dgenome/bin/peakachu", line 4, in <module>
    __import__('pkg_resources').run_script('peakachu==1.1.4', 'peakachu')
  File "/usr/local/anaconda3-2020/envs/3dgenome/lib/python3.6/site-packages/pkg_resources/__init__.py", line 666, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/local/anaconda3-2020/envs/3dgenome/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1464, in run_script
    exec(code, namespace, namespace)
  File "/usr/local/anaconda3-2020/envs/3dgenome/lib/python3.6/site-packages/peakachu-1.1.4-py3.6.egg/EGG-INFO/scripts/peakachu", line 80, in <module>
    run()
  File "/usr/local/anaconda3-2020/envs/3dgenome/lib/python3.6/site-packages/peakachu-1.1.4-py3.6.egg/EGG-INFO/scripts/peakachu", line 76, in run
    args.func(args)
  File "/usr/local/anaconda3-2020/envs/3dgenome/lib/python3.6/site-packages/peakachu-1.1.4-py3.6.egg/peakachu/calculate_depth.py", line 27, in main
    intra = np.triu(Lib.matrix(balance=False, sparse=False).fetch(k), k=mindis)
  File "<__array_function__ internals>", line 6, in triu
  File "/usr/local/anaconda3-2020/envs/3dgenome/lib/python3.6/site-packages/numpy/lib/twodim_base.py", line 467, in triu
    return where(mask, zeros(1, m.dtype), m)
  File "<__array_function__ internals>", line 6, in where
MemoryError: Unable to allocate 231. GiB for an array with shape (248957, 248957) and data type int32
tariks commented 4 years ago

Hello!

I don't have a definite answer yet as I've neither tested on CentOS nor have I used 1kb resolution arrays with that function. What happens when you try the overcommit workaround described in the stackoverflow answers? Does it work fine for 5kb or 10kb resolution? I'm also curious to know if cooltools works on the same file.

XiaoTaoWang commented 4 years ago

Hi, the highest resolution I've tested for training so far is 5K. But if you are just trying to calculate the total intra-chromosomal reads, I suggest using lower-resolution matrices. In your case, the command will be peakachu depth -p contactmatrix.mcool::/resolutions/10000 or even peakachu depth -p contactmatrix.mcool::/resolutions/100000.

CriticalSci commented 4 years ago

Thanks guys, I was able to run the tool on 16K resolution peakachu depth -p H3K27Ac.mcool::/resolutions/16000 :) I only got about 12.5M intra-reads so I guess I will use the 30M models!