isovic / graphmap

GraphMap - A highly sensitive and accurate mapper for long, error-prone reads http://www.nature.com/ncomms/2016/160415/ncomms11307/full/ncomms11307.html Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:
https://github.com/lbcb-sci/graphmap2
MIT License
178 stars 44 forks source link

multiple matches #10

Closed lfaino closed 8 years ago

lfaino commented 8 years ago

Dear, I have a question about graphmap output. We are trying to align two identical files to each other and identify secondary alignments. With this I mean that we use a single fasta file as database and query. We aspect to have one best hit to the read itself and secondary alignments.

How can we get the software to report all the alignments irrespective of the identity? Right now, we only get the self alignment of the read to itself and no secondary alignments.

thanks Luigi

isovic commented 8 years ago

Hi Luigi!

Thank you for your inquiry! Even though we discussed this elsewhere, I wanted to make a post to help other people with the same questions.

If I understood your question, you basically want to find overlaps between reads? Running the default parameters won't work in this case, because self-alignment of a read would by far be the best one and therefore chosen to be output, and if the reads are erroneous, chances are other reads would not come close to the quality of this alignment. Instead, I would suggest using overlapping between reads.

GraphMap has two overlap modes:

  1. A slower mode with the full GraphMap pipeline, which produces alignments and outputs a SAM file. To use it run: ./graphmap -w overlapper -r reads.fa -d reads.fa -o overlaps.sam The "-w overlapper" is also a composite parameter which changes the values of several other parameters, namely: "-a anchor -Z -F 0.50 -z 1e0", so use with care. This mode should be as sensitive as GraphMap.
  2. And more interesting - "owler" (Overlap With Long Erroneous Reads) - a trimmed-down version of GraphMap which skips the graph step, uses only one gapped spaced seed (the 6-1-6 one), and the alignment step is omitted. This mode is fast, sensitive and very specific (from the tests we ran so far, but more testing is still in order). It can be used to overlap nanopore 2d and PacBio reads. The output is generated in the MHAP format (http://mhap.readthedocs.org/en/latest/quickstart.html#output), however the fields representing "Jaccard score" and "# shared min-mers" are actually the fraction of covered bases in the overlap, and number of kmers in the overlapping region, respectively. To use it run: ./graphmap -w owler -r reads.fa -d reads.fa -o overlaps.mhap

Currently these options are available only on the dev branch. I will leave this issue open until I merge to master, as a reminder.

Best regards, Ivan.