I am running several populations separately in smc++, and for about 70% of the bootstraps (and often for runs on the original data), the part of the analysis that says "smcpp.optimize.plugins.progress_printer INFO EM iteration xxx of 20" either never comes up, or concludes after about 3 or 4 of the 20 iterations. However, the program does not tell me there was an error, and behaves as if the analysis was successfully completed. Sometimes I also get the "RuntimeError: erroneous average coalescence time" error instead. I'm able to produce a plot, but the plot is not usually comparable to instances in which the model actually finishes running all 20 iterations. As a result, these runs end up looking like outliers when I observe all of my bootstrapped runs. I've tried drastically varying the "timepoints" command and it doesn't work. Thanks for any help you can give.
Here's the code for running one of the four distinguished individuals to generate the input data:
for i in {50..88}
do
docker run --rm -v $PWD:/mnt terhorst/smcpp:latest vcf2smc -d S19686_merge_sort.bam S19686_merge_sort.bam --mask ref_chrNC_0534$i.1.mask.bed.gz HQ.noscaffolds.vcf.gz S19686.$i.smc.gz NC_0534$i.1 pop1:S19686_merge_sort.bam,S19687_merge_sort.bam,S19689_merge_sort.bam,S19690_merge_sort.bam,S19694_merge_sort.bam
done
Here's an example of the code for running "estimate":
for bs in {1..25}
do
cd bs_$bs
docker run --rm -v $PWD:/mnt terhorst/smcpp:latest estimate --cores 6 --timepoints 1 100000 --knots 12 0.2e-8 -o HQ.bs$bs.12knots *smc.gz
cd ..
done
I am running several populations separately in smc++, and for about 70% of the bootstraps (and often for runs on the original data), the part of the analysis that says "smcpp.optimize.plugins.progress_printer INFO EM iteration xxx of 20" either never comes up, or concludes after about 3 or 4 of the 20 iterations. However, the program does not tell me there was an error, and behaves as if the analysis was successfully completed. Sometimes I also get the "RuntimeError: erroneous average coalescence time" error instead. I'm able to produce a plot, but the plot is not usually comparable to instances in which the model actually finishes running all 20 iterations. As a result, these runs end up looking like outliers when I observe all of my bootstrapped runs. I've tried drastically varying the "timepoints" command and it doesn't work. Thanks for any help you can give.
Here's the code for running one of the four distinguished individuals to generate the input data: for i in {50..88} do docker run --rm -v $PWD:/mnt terhorst/smcpp:latest vcf2smc -d S19686_merge_sort.bam S19686_merge_sort.bam --mask ref_chrNC_0534$i.1.mask.bed.gz HQ.noscaffolds.vcf.gz S19686.$i.smc.gz NC_0534$i.1 pop1:S19686_merge_sort.bam,S19687_merge_sort.bam,S19689_merge_sort.bam,S19690_merge_sort.bam,S19694_merge_sort.bam done
Here's an example of the code for running "estimate": for bs in {1..25} do cd bs_$bs docker run --rm -v $PWD:/mnt terhorst/smcpp:latest estimate --cores 6 --timepoints 1 100000 --knots 12 0.2e-8 -o HQ.bs$bs.12knots *smc.gz cd .. done