rust-seq / minimizer-iter

Iterate over minimizers of a DNA sequence
https://crates.io/crates/minimizer-iter
MIT License
26 stars 2 forks source link

minimizer-iter

crates.io docs

Iterate over minimizers of a DNA sequence.

Features

If you'd like to use the underlying data structure manually, have a look at the minimizer-queue crate.

Example usage

use minimizer_iter::MinimizerBuilder;

// Build an iterator over minimizers
// of size 21 with a window of size 11
// for the sequence "TGATTGCACAATC"
let min_iter = MinimizerBuilder::<u64>::new()
    .minimizer_size(21)
    .width(11)
    .iter(b"TGATTGCACAATC");

for (minimizer, position) in min_iter {
    // ...
}

If you'd like to use mod-minimizers instead, just change new() to new_mod():

use minimizer_iter::MinimizerBuilder;

// Build an iterator over mod-minimizers
// of size 21 with a window of size 11
// for the sequence "TGATTGCACAATC"
let min_iter = MinimizerBuilder::<u64, _>::new_mod()
    .minimizer_size(21)
    .width(11)
    .iter(b"TGATTGCACAATC");

for (minimizer, position) in min_iter {
    // ...
}

Additionally, the iterator can produce canonical minimizers so that a sequence and its reverse complement will select the same minimizers. To do so, just add .canonical() to the builder:

MinimizerBuilder::<u64>::new()
    .canonical()
    .minimizer_size(...)
    .width(...)
    .iter(...)

If you need longer minimizers (> 32 bases), you can specify a bigger integer type such as u128:

MinimizerBuilder::<u128>::new()
    .minimizer_size(...)
    .width(...)
    .iter(...)

See the documentation for more details.

Benchmarks

To run benchmarks against other implementations of minimizers, clone this repository and run:

cargo bench

Contributors