Open chapplec opened 5 years ago
Hi Chapplec,
Did you find that name sorting your input SAM resolved your issue with truncated output?
Hi, and sorry it took me so long to answer, I was working on something else. So, the input file I was using was created from an already sorted bam file, so I had been assuming that meant it was sorted. However, I tried again after sorting by name and this time I get an empty file:
$ samtools sort -n sample.sam > sample.sorted.sam
$ grep -c '^[^@]' sample.sorted.sam
2767369
$ primerclip Acel-Amplicon_56g_masterfile.txt sample.sorted.sam sample.sorted.clipped.sam
all 263 master records parsed successfully.
primer trimming complete.
$ wc sample.sorted.clipped.sam
0 0 0 sample.sorted.clipped.sam
For me it only worked after sorting by name (samtools sort -n). That the input should be in this format should be documented somewhere!
I am trying to implement primerclip as part of my pipeline. However, when testing it on a small sam file, it removes all but two sequences:
I get the same behavior using both the official bwa mem and the Sentieon implementation of the same tool (https://www.sentieon.com/products/) and on two different samples. However, neither sample was actually sequenced using the Swift kit whose primer file I am testing. Perhaps this is the source of the problem, but that would seem strange. Why remove sequences if they don't match the primers? More importantly, why are sequences being removed at all? I was expecting the primers to be soft clipped, or hard clipped, but not the removal of entire reads!
What am I doing wrong here?