Open QianXiaobo opened 3 years ago
@QianXiaobo were you ever able to figure this out?
yeah this is a good question. I was wondering about this too... did anyone ever get a solution? right now I have been running split as
smc++ split -o split/ pop1/model.final.json pop2/model.final.json data/pop1_pop2*.smc.gz
smc++ split -o split/ pop2/model.final.json pop1/model.final.json data/pop2_pop1.*.smc.gz
as an additional comment, I would expect the resulting plots to be mirror images of each other but there are differences. how are others structuring their split command?
@Callithrix-omics, were you able to reach a solution? I am just now looping back to this analysis with a rebuilt dataset.
@terhorst, could you provide some insight on this? The smc++ writeup suggests 4 SFS inputs (x #chrom). 1dSFS for each population, and 2dSFS for each population-order-combination. But in the issues section, it seems hazy still
@pstokespmb I ran it both ways and essentially got the same results
@Callithrix-omics Thanks for the quick reply! Much appreciated. I updated my question since your response here. Did you end up inputting all 4 sources of vcf2smc output? Or just the input of the estimate models and one set of 2dSFS vcf2smc output?
@pstokespmb I tried it both ways with the one run with the set of 2dSFS vcf2smc and then another separate run with the 4 sources and the results came out the same for me. so I guess same difference. I did run each version of split 3 times and there was slight variability between runs but nothing major
@Callithrix-omics I REALLY appreciate you for this! Thank you :).
I do think this should be more explicitly stated in the readme if possible, @terhorst
yeah I agree with you @pstokespmb. good luck with your analyses.
Hi all,
I may have to follow up on your question here as I am also confused.
I initially ran the analyses as suggested in the documentation (with the exception that I only looped over the chromosomes) such as:
##########################
##########################
mkdir results_divergence myvcf='myfile.vcf.gz'
Pop1='Pop1:Pop1_ind1, Pop1_ind2, Pop1_ind3' LINES=$(cat 20_chrom_list.txt) for i in $LINES do smc++ vcf2smc "$myvcf" resultsdivergence/"Pop1.smc"$i".gz" "$i" "$Pop1" --ignore-missing -c 1000000 done
Pop2='Pop2:Pop2_ind1, Pop2_ind2, Pop2_ind3' LINES=$(cat 20_chrom_list.txt) for i in $LINES do smc++ vcf2smc "$myvcf" resultsdivergence/"Pop2.smc"$i".gz" "$i" "$Pop2" --ignore-missing -c 1000000 done
####################################
####################################
LINES=$(cat 20_chrom_list.txt) for i in $LINES do smc++ vcf2smc "$myvcf" results_divergence/"Pop1Pop2.smc"$i".gz" "$i" "$Pop1" "$Pop2" --ignore-missing -c 1000000 done
LINES=$(cat 20_chrom_list.txt) for i in $LINES do smc++ vcf2smc "$myvcf" results_divergence/"Pop2Pop1.smc"$i".gz" "$i" "$Pop2" "$Pop1" --ignore-missing -c 1000000 done
####################################
####################################
smc++ estimate --base Pop1 -o Pop1_out/ 1.06e-8 Pop1. -c 100000 --polarization-error 0.5 --ftol 1e-7 --xtol 1e-7 --thinning 1300 --em-iterations 1000 smc++ estimate --base Pop2 -o Pop2_out/ 1.06e-8 Pop2. -c 1000000 --polarization-error 0.5 --ftol 1e-7 --xtol 1e-7 --thinning 1300 --em-iterations 1000
smc++ split -o Pop1_Pop2_out Pop1_out/Pop1.final.json Pop2_out/Pop2.final.json .smcgz -c 1000000 --thinning 1300 --em-iterations 10000 --regularization-penalty 6 --polarization-error 0.5 --ftol 1e-7 --xtol 1e-7 smc++ plot Pop1_Pop2_joint_years.pdf Pop1_Pop2_out/model.final.json -g 6 --xlim 1 500000 -c
Howver, now I wonder if I should also loop over my samples, such as:
##########################
##########################
Pop1='Pop1:Pop1_ind1, Pop1_ind2, Pop1_ind3'
for c in {Chrom1, Chrom2} do for s in {Pop1_ind1, Pop1_ind2, Pop1_ind3}; do smc++ vcf2smc "$myvcf" resultsdivergence/"Pop1.smc"$c"_"$s".gz" $c -d $s $s "$Pop1" --ignore-missing -c 1000000 done done
Pop2='Pop2:Pop2_ind1, Pop2_ind2, Pop2_ind3' for c in {Chrom1, Chrom2} do for s in {Pop2_ind1, Pop2_ind2, Pop2_ind3}; do smc++ vcf2smc "$myvcf" resultsdivergence/"Pop2.smc"$c"_"$s".gz" $c -d $s $s "$Pop2" --ignore-missing -c 1000000 done done
####################################
####################################
for c in {Chrom1, Chrom2} do for s in {Pop1_ind1, Pop1_ind2, Pop1_ind3, Pop2_ind1, Pop2_ind2, Pop2_ind3}; ## not sure I need the samples for pop2 or not do smc++ vcf2smc "$myvcf" results_divergence/"Pop1Pop2.smc"$c"_"$s".gz" $c -d $s $s "$Pop1" "$Pop2" --ignore-missing -c 1000000 done done
for c in {Chrom1, Chrom2} do ffor s in {Pop1_ind1, Pop1_ind2, Pop1_ind3, Pop2_ind1, Pop2_ind2, Pop2_ind3}; ## not sure I need the samples for pop1 or not do smc++ vcf2smc "$myvcf" results_divergence/"Pop2Pop1.smc"$c"_"$s".gz" $c -d $s $s "$Pop2" "$Pop1" --ignore-missing -c 1000000 done done
####################################
####################################
smc++ estimate --base Pop1 -o Pop1_out/ 1.06e-8 Pop1. -c 100000 --polarization-error 0.5 --ftol 1e-7 --xtol 1e-7 --thinning 1300 --em-iterations 1000 smc++ estimate --base Pop2 -o Pop2_out/ 1.06e-8 Pop2. -c 1000000 --polarization-error 0.5 --ftol 1e-7 --xtol 1e-7 --thinning 1300 --em-iterations 1000
smc++ split -o Pop1_Pop2_out Pop1_out/Pop1.final.json Pop2_out/Pop2.final.json .smcgz -c 1000000 --thinning 1300 --em-iterations 10000 --regularization-penalty 6 --polarization-error 0.5 --ftol 1e-7 --xtol 1e-7 smc++ plot Pop1_Pop2_joint_years.pdf Pop1_Pop2_out/model.final.json -g 6 --xlim 1 500000 -c
your help would be greatly appreciated! :)
Hi terhorst,
I am a little confused whether
data/*.smc.gz
consisting of tag1, 2, 3, and 4 in the picture, total four (multiply by the numble of chromosomes) files or not? Ordata/*.smc.gz
just includesdata/pop12.chr*.smc.gz
anddata/pop21.chr*.smc.gz
?Hope for your reply.
Best wishes, Xb