sanger-pathogens / Bio-Tradis

A set of tools to analyse the output from TraDIS analyses
https://sanger-pathogens.github.io/Bio-Tradis/
Other
21 stars 29 forks source link

fixed bug in handling soft clipping at read start, updated tutorial #121

Closed lbarquist closed 3 years ago

lbarquist commented 3 years ago

This fix is in response to to issues #120 and #119.

The changes to Cigar.pm fix the handling of soft-clipping at the start of read alignments: the original code appears to have assumed that the alignment start coordinate in the bam/sam file corresponds to the first base of the read. This isn't true for soft-clipped reads. I have tested this with some data I had with adapter contamination that leads to wide-spread soft clipping and hence an overestimation of unique insertion sites; bwa with and without adapter trimming now give much more similar results with this fix. It would probably be useful for someone else to double check the logic of Cigar.pm, and make sure I haven't missed something.

I have also updated the Bio-TraDIS tutorial to reflect changes to ENA, and that bwa is now the default mapper (i.e. I've included the --smalt tag in the bacteria_tradis call).