popgenmethods / LDhelmet

Software package for estimating fine-scale recombination rate.
GNU General Public License v3.0
13 stars 1 forks source link

Required memory estimation #4

Closed ChenJuiYANG closed 7 months ago

ChenJuiYANG commented 8 months ago

Hi, Is there a rough estimation of the required memory for table_gen and pade steps? What parameters could affect the memory usage?
I ran the above commands on the sever of our institute, but the job crashed because out of memory. The maximum number cores and memory in my test runs is 16 cores and about 100 GB memory, and the jobs crashed after 1 and 2 hours for table_gen and pade commands, respectively.

Information for my test run: version: 1.10 Sequence length: ~ 205 kb Sample size: 37 (74 sequences) Command used:

ldhelmet find_confs --num_threads 10 -w 50 -o output.find_confs ${input}
ldhelmet table_gen --num_threads 16 -c ${input} -t 0.01 -r 0.0 0.1 10.0 1.0 100.0 -o output.table_gen
ldhelmet pade --num_threads 16 -c ${input} -t 0.01 -x 11 -o output.pade
ChenJuiYANG commented 7 months ago

Answer myself. According to this github homepage: "LDhelmet can handle sample sizes of up to 50 individuals (haplotypes), and is suitable for whole-genome sequence analysis."

I misunderstood the meaning. The input file should less than 50 haplotypes. However, in my experience, using 50 haplotypes will be very slow. The computing time will be more reasonable when using around 40 haplotypes.