shao-lab / MotifScan

A motif discovery tool to detect the occurrences of known motifs
BSD 3-Clause "New" or "Revised" License
20 stars 1 forks source link

Memory consumption issue #4

Open Mr-Milk opened 3 years ago

Mr-Milk commented 3 years ago

I tried to scan motif on a genome region with hg38 build with -t 18 corresponded to my CPU number

but it raised:

OSError: [Errno 12] Cannot allocate memory

And then I tried with -t 8, the program ate up to around 50G of my RAM. I ran it on WSL2 ubuntu 20.04 TLS.

image

hongduosun commented 3 years ago

Sorry, but how many regions were scanned?

Mr-Milk commented 3 years ago

More than 200K

hongduosun commented 3 years ago

I'm afraid this is a temporary limit for MotifScan because only small parts of codes are refactored using C to speed up calculating motif scores. So every single motif score is stored and passing back to Python and this requires O(n_region length_per_region n_motif) memories. I'll improve this in the next update.

Mr-Milk commented 3 years ago

Thanks for your answer. Just a little suggestion, I looked at your code, the parallelism is using python's multiprocessing which might be the reason for such huge memory consumption. Since it will basically copy the whole process of the current python process. It might help if you could try to implement the parallelism from C-side.

hongduosun commented 3 years ago

Thanks a lot for your advice!

hongduosun commented 3 years ago

This has been fixed in v1.3.0 after using pthread in the C extension. Thanks again!

Mr-Milk commented 3 years ago

I tried it with the same dataset, at some point, the programme still ate up all of my RAM and caused a system exit 😥, but it's sure better than before 🥰. Is it possible to free some unused memory, save the results to the disk during the process?