Open Mr-Milk opened 3 years ago
Sorry, but how many regions were scanned?
More than 200K
I'm afraid this is a temporary limit for MotifScan because only small parts of codes are refactored using C to speed up calculating motif scores. So every single motif score is stored and passing back to Python and this requires O(n_region length_per_region n_motif) memories. I'll improve this in the next update.
Thanks for your answer. Just a little suggestion, I looked at your code, the parallelism is using python's multiprocessing which might be the reason for such huge memory consumption. Since it will basically copy the whole process of the current python process. It might help if you could try to implement the parallelism from C-side.
Thanks a lot for your advice!
This has been fixed in v1.3.0 after using pthread in the C extension. Thanks again!
I tried it with the same dataset, at some point, the programme still ate up all of my RAM and caused a system exit 😥, but it's sure better than before 🥰. Is it possible to free some unused memory, save the results to the disk during the process?
I tried to scan motif on a genome region with hg38 build with
-t 18
corresponded to my CPU numberbut it raised:
And then I tried with
-t 8
, the program ate up to around 50G of my RAM. I ran it on WSL2 ubuntu 20.04 TLS.