popgenmethods / smcpp

SMC++ infers population history from whole-genome sequence data.
GNU General Public License v3.0
149 stars 32 forks source link

most runs not finishing but program acts like they did #211

Open selasphoruskershaw opened 2 years ago

selasphoruskershaw commented 2 years ago

I am running several populations separately in smc++, and for about 70% of the bootstraps (and often for runs on the original data), the part of the analysis that says "smcpp.optimize.plugins.progress_printer INFO EM iteration xxx of 20" either never comes up, or concludes after about 3 or 4 of the 20 iterations. However, the program does not tell me there was an error, and behaves as if the analysis was successfully completed. Sometimes I also get the "RuntimeError: erroneous average coalescence time" error instead. I'm able to produce a plot, but the plot is not usually comparable to instances in which the model actually finishes running all 20 iterations. As a result, these runs end up looking like outliers when I observe all of my bootstrapped runs. I've tried drastically varying the "timepoints" command and it doesn't work. Thanks for any help you can give.

Here's the code for running one of the four distinguished individuals to generate the input data: for i in {50..88} do docker run --rm -v $PWD:/mnt terhorst/smcpp:latest vcf2smc -d S19686_merge_sort.bam S19686_merge_sort.bam --mask ref_chrNC_0534$i.1.mask.bed.gz HQ.noscaffolds.vcf.gz S19686.$i.smc.gz NC_0534$i.1 pop1:S19686_merge_sort.bam,S19687_merge_sort.bam,S19689_merge_sort.bam,S19690_merge_sort.bam,S19694_merge_sort.bam done

Here's an example of the code for running "estimate": for bs in {1..25} do cd bs_$bs docker run --rm -v $PWD:/mnt terhorst/smcpp:latest estimate --cores 6 --timepoints 1 100000 --knots 12 0.2e-8 -o HQ.bs$bs.12knots *smc.gz cd .. done