tseemann / samclip

Filter SAM file for soft and hard clipped alignments
GNU General Public License v3.0
46 stars 10 forks source link

Unexpected inversion behaviour. Removes a few % more reads, not returning the removed reads. #18

Open StarburstCLA opened 2 years ago

StarburstCLA commented 2 years ago

Version samclip 0.4.0

I'm having issues with --invert option.

currently

samclip --max 100 --ref file.fasta < file.sam > file_samclip100.sam outputs [samclip] Total SAM records 146807, removed 944, allowed 145705, passed 145863 file 2,648,407 kb reads 145723 (1084 less than 146807)

samclip --invert --max 100 --ref file.fasta < file.sam > file_samclip100invert.sam outputs [samclip] Total SAM records 146807, removed 944, allowed 145705, passed 145863 file 2,645,487 kb reads 145605 (1202 less than 146807)

read counts calculated with samtools coverage

So --invert is doing something but not what I expect. It appears to be removing MORE reads than without? And certainly not outputting the 944 reads I'm expecting. This might be expected behaviour I'm not understanding possibly due to the question below.

Also a question: I'm not clear what the difference is between allowed and passed i.e. in output [samclip] Total SAM records 146807, removed 944, allowed 145705, passed 145863

I see from the code that passed = total - removed so I would expect this to be the number of reads remaining?

But I'm not clear what "allowed" is in such circumstances as it seems to iterate in the code but not clear showing what?

Version samclip 0.4.0

Tried editing my $invert = 0; to my $invert = 1;

in the code

and same behaviour so the option seems to be setting corectly