This is an implementation of various haplotype frequency spectrum statistics useful for detecting hard and soft selective sweeps in genomes. This program implements the following statistics:
saltiLASSI: DeGriorgio and Szpiech (2022) PLoS Genetics 18: e1010134. LASSI: Harris and DeGiorgio (2020) MBE doi.org/10.1093/molbev/msaa115. H12: Garud et al. (2015) PLoS Genetics 11:e1005004. H2/H1: Garud et al. (2015) PLoS Genetics 11:e1005004. G123: Harris et al. (2018) Genetics 210:1419-1452. G2/G1: Harris et al. (2018) Genetics 210:1419-1452. Number of Unique Haplotypes at Locus
lassip accepts VCF files, either phased (default, "hap" output files) or unphased (set --unphased, "mlg" output files), with or without missing data. ***SEE CHANGELOG 19JAN2024 for update on how missing data is handeled. lassip expects one vcf file per contig, provided one at a time. You must provide a population file that specifies population IDs for each individual ID you wish to analyse. Only IDs listed in the population file will be analyzed, and if multiple populations are present, all statistics will be computed on a per-population basis.
Use --hapstats to compute H/G stats in sliding windows along the genome, whether or not data are phased determines whether H or G statistics are used.
Use --calc-spec to compute the top K haplotype frequency specra in sliding windows along a contig. Pass multiple spectra files (e.g. from multiple contigs) with --spectra and --lassi to run the CLR computation for detecting sweeps (Harris and DeGiorgio 2020) or with --salti to run CLR computation from (DeGiorgio and Szpiech 2021).
If only --hapstats is given, files are named