Closed muffato closed 2 months ago
Requirement for input files.
vcf_input.csv: chrom,vcf,vcf_idx chr1,chrom1.vcf.gz,chrom1.vcf.gz.tbi chr2,chrom2.vcf.gz,chrom2.vcf.gz.tbi
sample.map: ind1 pop1 ind2 pop1 ind3 pop2 ind4 pop2
Splitting the VCF file by chromosomes bcftools index -s mLutLut_renamed_autosomes_bisnps.vcf.gz | cut -f 1 | while read C; do bcftools view -O z -o split.${C}.vcf.gz mLutLut_renamed_autosomes_bisnps.vcf.gz.vcf.gz "${C}" ; done
Downloaded supplementary data from https://doi.org/10.1093/molbev/msad207 and followed EurasianOtter_PopGen.html to obtain vcf.gz files and rename samples, and select only autosomes and bialleleic SNPs for analyses. Split the vcf file by chromosomes using bcftools. Ran "nextflow run scalepopgen -profile singularity -params-file /global/scratch/users/hangxue/otter/vcf_publication/jul4_parameters.yml -qs 10". See output graphs at https://docs.google.com/presentation/d/1O8vFmYImrJd6p4pvSLyzwiMsf9fTAZSTaG_FJGLz8t8/edit#slide=id.p
Tested PCA, Admixture, Pairwise Fst and Treemix in scalepopgen. These can run successfully with little modifications. Scalepopgen can also do Tajimas_D and search for selective sweeps selection (Sweepfinder2), but plotting the these two results requires the type of the chromosome name being integer. Out of these, Sweepfinder2 takes the longest, ~7hr for the otter data, followed by admixture ~1hr. Additional potential analysis:
Regarding the otter data. Here is more information about the sample confusion that occurred during that project.
The label swaps were very visible on the admixture plots, see left (labels corrected) vs right (wrong labels) In your pipeline run it's only k=2 that is a bit messy. All the other k are clean. I think you may have the correct labels and the differences are due to different methods / parameters ?
I have doubled checked the label. I think the ones I am working with is labeled correctly. Yeah, I think the difference might be due to different softwares / parameters
We need to review how much of our population genomics ideas Popgen48/scalepopgen can do to determine:
Links: poster
Summary
Next developments
Based on the tests above, to use scalepopgen, we would want to: