mehrdadbakhtiari / adVNTR

A tool for genotyping Variable Number Tandem Repeats (VNTR) from sequence data
http://advntr.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
41 stars 15 forks source link

Slow addmodel #71

Closed muyao11 closed 5 months ago

muyao11 commented 5 months ago

Hi, I tried to build database for my VNTRs, but after running addmodel with 97cpu Hours, only about 1000 loci were added. Any Suggestion for acceleration?

Jong-hun-Park commented 5 months ago

How many VNTRs are in your taget and what are the lengths of VNTRs? That may happen when the length is too long, but it usually takes less than that. It also depends on the CPU spec.

One simple solution is to split the loci into multiple batches and build a database for each. For example, you split the 1000 loci into 10 groups, 100 loci for each. It would take ~9 hours to build 10 DBs for 1000 loci. And, you run adVNTR for multiple databases with the same samples, and merge the results.

muyao11 commented 5 months ago

Thank you for your advice. I will split my loci with a group size of 500, this will take me less than one day to build my DBs. Many thanks.