lbcb-sci / graphmap2

GraphMap - A highly sensitive and accurate mapper for long, error-prone reads http://www.nature.com/ncomms/2016/160415/ncomms11307/full/ncomms11307.html https://www.biorxiv.org/content/10.1101/720458v1
MIT License
67 stars 6 forks source link

comparison with overlap alignment ground truth usearch -global? #26

Open jianshu93 opened 3 months ago

jianshu93 commented 3 months ago

Hi Graphmap2 team,

Even both graphmap2 and Minimap2 are widely used ideas for overlap detection, I did not see benchmark against a truth but only against other approximate tools (e.g., the most recent loadFast was compared with Minimap2 and graphmap). By saying truth, overlap is essentially semi-global alignment, as implemented in usearch/vsearch for example (open sourced recently). I understand that in usearch/vsearch, gaps extended at both ends are also penalized but with a much smaller penalty score so that the best alignment can still be achieved in the end when searching a database (can choose all versus all comparison without heuristics). My question is, for overlap detection, assume a long read, has many matches in the same fasta file, with semi-global identity ranges from 80% to 100%, overall alignment ratio (overlapped length divided by the length of this read) may also vary (e.g., >50%), the best overlapped hits found by graphmap2/minimap2, ranked by identity to this read, are they consistent for graphmap VS users/vsearch since I believe those 2 are the standards for semi-global alignment, what is the Pearson r and Spearman Rank pho for many such long reads in a sample (say a natural metagenomic long reads sequencing experiments, 100 million long reads).

Thanks,

Jianshu