lindenb / jvarkit

Java utilities for Bioinformatics
https://jvarkit.readthedocs.io/
Other
482 stars 133 forks source link

Biostar154220/CapBam memory usage #31

Closed dakl closed 9 years ago

dakl commented 9 years ago

Hi Pierre,

Thanks a lot for Biostar154220, it works nicely (most of the time, hence this issue).

The problem is related to memory. I'm running my analyses on a cluster with 16-core 128Gb machines (8gb per core), I've increased the memory -Xmx16g (from the original -Xmx512m IIRC) and it still sometimes fails. The most common scenario is that it fails on contig 1 which is the largest, but in the example below, it actually fails on 17.

I'm now solving it by requesting 4-core jobs which gives me 32G of memory, but since both the sorting and the capping in biostar154220 are single threaded, this costs me more CPU hours than necessary and a CPUh usage efficiency around 25%.

Do you have any ideas on how memory usage could be improved?

[INFO/Biostar154220] 2015-08-27 10:36:53 "Alloc memory for contig 5 N=180915260*sizeof(int)"
[INFO/Biostar154220] 2015-08-27 10:37:02 " Count:2384664 Elapsed: 2 minutes Last: 5:9042344 Speed: 19.740597 record/millisec."
[INFO/Biostar154220] 2015-08-27 10:37:12 " Count:2597941 Elapsed: 2 minutes Last: 5:36424840 Speed: 19.861782 record/millisec."
[INFO/Biostar154220] 2015-08-27 10:37:13 "Alloc memory for contig 6 N=171115067*sizeof(int)"
[INFO/Biostar154220] 2015-08-27 10:37:22 " Count:2797449 Elapsed: 2 minutes Last: 6:115669821 Speed: 19.867117 record/millisec."
[INFO/Biostar154220] 2015-08-27 10:37:27 "Alloc memory for contig 7 N=159138663*sizeof(int)"
[INFO/Biostar154220] 2015-08-27 10:37:32 " Count:2992101 Elapsed: 2 minutes Last: 7:62464387 Speed: 19.83902 record/millisec."
[INFO/Biostar154220] 2015-08-27 10:37:48 " Count:3141130 Elapsed: 2 minutes Last: 7:109067668 Speed: 18.819529 record/millisec."
[INFO/Biostar154220] 2015-08-27 10:37:49 "Alloc memory for contig 8 N=146364022*sizeof(int)"
[INFO/Biostar154220] 2015-08-27 10:37:58 " Count:3335406 Elapsed: 2 minutes Last: 8:58173107 Speed: 18.853582 record/millisec."
[INFO/Biostar154220] 2015-08-27 10:38:00 "Alloc memory for contig 9 N=141213431*sizeof(int)"
[INFO/Biostar154220] 2015-08-27 10:38:08 " Count:3534333 Elapsed: 3 minutes Last: 9:115290284 Speed: 18.908772 record/millisec."
[INFO/Biostar154220] 2015-08-27 10:38:10 "Alloc memory for contig 10 N=135534747*sizeof(int)"
[INFO/Biostar154220] 2015-08-27 10:38:18 " Count:3740221 Elapsed: 3 minutes Last: 10:126980879 Speed: 18.992739 record/millisec."
[INFO/Biostar154220] 2015-08-27 10:38:20 "Alloc memory for contig 11 N=135006516*sizeof(int)"
[INFO/Biostar154220] 2015-08-27 10:38:28 " Count:3941682 Elapsed: 3 minutes Last: 11:91988264 Speed: 19.048384 record/millisec."
[INFO/Biostar154220] 2015-08-27 10:38:31 "Alloc memory for contig 12 N=133851895*sizeof(int)"
[INFO/Biostar154220] 2015-08-27 10:38:38 " Count:4151694 Elapsed: 3 minutes Last: 12:116147402 Speed: 19.137875 record/millisec."
[INFO/Biostar154220] 2015-08-27 10:38:42 "Alloc memory for contig 13 N=115169878*sizeof(int)"
[INFO/Biostar154220] 2015-08-27 10:38:51 " Count:4351132 Elapsed: 3 minutes Last: 13:104121043 Speed: 18.967363 record/millisec."
[INFO/Biostar154220] 2015-08-27 10:38:53 "Alloc memory for contig 14 N=107349540*sizeof(int)"
[INFO/Biostar154220] 2015-08-27 10:39:02 " Count:4405922 Elapsed: 4 minutes Last: 14:71376418 Speed: 18.3357 record/millisec."
[INFO/Biostar154220] 2015-08-27 10:39:08 "Alloc memory for contig 15 N=102531392*sizeof(int)"
[INFO/Biostar154220] 2015-08-27 10:39:12 " Count:4615042 Elapsed: 4 minutes Last: 15:100038088 Speed: 18.438631 record/millisec."
[INFO/Biostar154220] 2015-08-27 10:39:15 "Alloc memory for contig 16 N=90354753*sizeof(int)"
[INFO/Biostar154220] 2015-08-27 10:39:21 "Alloc memory for contig 17 N=81195210*sizeof(int)"
[INFO/Biostar154220] 2015-08-27 10:39:22 " Count:4820694 Elapsed: 4 minutes Last: 17:65811570 Speed: 18.520117 record/millisec."
[INFO/Biostar154220] 2015-08-27 10:39:32 " Count:4910689 Elapsed: 4 minutes Last: 17:67025169 Speed: 18.164728 record/millisec."
slurmstepd: Job 5848089 exceeded memory limit (16402356 > 16384000), being killed
slurmstepd: *** JOB 5848089 CANCELLED AT 2015-08-27T10:39:35 *** on m77

Thanks.

Daniel

lindenb commented 9 years ago

Hi daniel, I'm away from my sources/code for now. Using -Xmx was a good idea.

May be we could invoke the garbage manager before/after allocation memory. For a try you could try to add

depth_array=null;
System.gc();

just before https://github.com/lindenb/jvarkit/blob/master/src/main/java/com/github/lindenb/jvarkit/tools/biostar/Biostar154220.java#L152 ?

dakl commented 9 years ago

Yup - that does the trick. Submitted PR #32.