lh3 / minimap2

A versatile pairwise aligner for genomic and spliced nucleotide sequences
https://lh3.github.io/minimap2
Other
1.78k stars 407 forks source link

different mapping results based on index composition #1218

Closed tijyojwad closed 4 months ago

tijyojwad commented 4 months ago

Hi,

I'm running the following mm2 command using 2 different indices -

minimap2 -t 96 -K8g -cx ava-ont -k25 -w17 -e200 -r150 -m4000 -z200 --dual=yes index.fastq reads.fastq -Y 

case 1 - index.fastq has only one read from reads.fastq (say read_1) case 2 - index.fastq is the same as reads.fastq (so read_1 + many other reads)

I would expect that the generated PAF file would have the same sequences mapped to read_1 in both scenarios. But I see that in case 1, there are ~140k mappings whereas in case 2, there are no mappings. I check number of mappings by looking at the 6th column in the PAF.

Is this the expected behavior of mm2?

lh3 commented 4 months ago

Yes, expected. When you have more reads, some k-mers may be dropped due to high occurrence.