This lets us set a sparsification factor for the mappings in wfmash, which reduces alignment time. Mappings are sampled based on a hash of the mapping record. We take hash minimizing mapping records at the rate given by the sparsification factor.
It's also possible to set pggb -x auto to get an automatic sparsification setting based on the number of haplotypes (either -n or -H). This sets the wfmash sparsification to 10 * log(n) / n, which appears to be a decent heuristic for panmictic populations. We might consider making this default, to reduce user issues that will become inevitable with high numbers of input genomes.
This lets us set a sparsification factor for the mappings in wfmash, which reduces alignment time. Mappings are sampled based on a hash of the mapping record. We take hash minimizing mapping records at the rate given by the sparsification factor.
It's also possible to set
pggb -x auto
to get an automatic sparsification setting based on the number of haplotypes (either-n
or-H
). This sets the wfmash sparsification to10 * log(n) / n
, which appears to be a decent heuristic for panmictic populations. We might consider making this default, to reduce user issues that will become inevitable with high numbers of input genomes.