Sparsify mappings - Githubissues

This lets us set a sparsification factor for the mappings in wfmash, which reduces alignment time. Mappings are sampled based on a hash of the mapping record. We take hash minimizing mapping records at the rate given by the sparsification factor.

It's also possible to set pggb -x auto to get an automatic sparsification setting based on the number of haplotypes (either -n or -H). This sets the wfmash sparsification to 10 * log(n) / n, which appears to be a decent heuristic for panmictic populations. We might consider making this default, to reduce user issues that will become inevitable with high numbers of input genomes.

pangenome / pggb

Sparsify mappings #201