pangenome / pggb

the pangenome graph builder
https://doi.org/10.1038/s41592-024-02430-3
MIT License
368 stars 41 forks source link

Sparsify mappings #201

Closed ekg closed 2 years ago

ekg commented 2 years ago

This lets us set a sparsification factor for the mappings in wfmash, which reduces alignment time. Mappings are sampled based on a hash of the mapping record. We take hash minimizing mapping records at the rate given by the sparsification factor.

It's also possible to set pggb -x auto to get an automatic sparsification setting based on the number of haplotypes (either -n or -H). This sets the wfmash sparsification to 10 * log(n) / n, which appears to be a decent heuristic for panmictic populations. We might consider making this default, to reduce user issues that will become inevitable with high numbers of input genomes.