maickrau / MBG

MIT License
57 stars 2 forks source link

MBG

Minimizer based sparse de Bruijn Graph constructor. Homopolymer compress input sequences, pick syncmers from hpc-compressed sequences, connect syncmers with an edge if they are adjacent in a read, unitigify and homopolymer decompress. Suggested input is PacBio HiFi/CCS reads, or ONT duplex reads. May or may not work with Illumina reads. Not suggested for PacBio CLR or regular ONT reads. Algorithmic details and citation: https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btab004/6104877

Installation

Bioconda: conda install -c bioconda mbg

Compilation

Usage

MBG -i input_reads.fa -o output_graph.gfa -k kmer_size -w window_size -a kmer_min_abundance -u unitig_min_abundance

eg MBG -i reads.fa -o graph.gfa -k 1501 -w 1450 -a 1 -u 3

Multiple read files can be inputted with "-i file1.fa -i file2.fa" etc. Input read type can be .fa / .fq / .fa.gz / .fq.gz.

Parameters

Other options:

k and w can be arbitrarily large but at some point the error rate and limited read length will cause the graph to be fragmented. Runtime stays approximately the same if the ratio k/w is kept constant. All repeats shorter than k are separated, all repeats longer than k+w are collapsed, and repeats in between may be separated or collapsed depending on if a k-mer was selected from within the repeat. When using --blunt, you should clean the graph afterwards with vg. --blunt uses an extension of an algorithm invented by Hassan Nikaein (personal communication).