samtools / samtools

Tools (written in C using htslib) for manipulating next-generation sequencing data
http://htslib.org/
Other
1.64k stars 580 forks source link

Ambiguous Wording in Documentation #2149

Open DarioS opened 1 week ago

DarioS commented 1 week ago

Is it the number of alignments or the number of reads (there can be a 1:many relationship between a read and its alignments)?

non-primary alignments - number of secondary reads (flag 0x100 (256) set). supplementary alignments - number of supplementary reads (flag 0x800 (2048) set).

If a particular read maps to 100 locations in the reference genome, does this metric increment by 1 or by 100?

reads duplicated - number of duplicate reads (flag 0x400 (1024) is set).

I expect it to increment by 1 since it only mentions reads.

whitwham commented 3 days ago

So this is another samtools stats question. It would be helpful if you state which part of samtools you are raising the issue against.

non-primary alignments - number of secondary reads (flag 0x100 (256) set). supplementary alignments - number of supplementary reads (flag 0x800 (2048) set).

If a particular read maps to 100 locations in the reference genome, does this metric increment by 1 or by 100?

That would be 100. Multiple copies of the read would be in the SAM file.

reads duplicated - number of duplicate reads (flag 0x400 (1024) is set).

I expect it to increment by 1 since it only mentions reads.

For each read that is a duplicate of another the number goes up by one. Reads that have supplementary or secondary alignments can also have those marked as duplicates (depending on how you do your duplicate detection).