tommyau / bamclipper

Remove primer sequence from BAM alignments by soft-clipping
MIT License
31 stars 10 forks source link

Overlapping amplicons #13

Open aledj2 opened 4 years ago

aledj2 commented 4 years ago

Hi, I'm using an amplicon kit which has tiled amplicons. These can create super amplicons, where the forward primer for amplicon 1 can form an 'super' amplicon with the reverse primer of amplicon2.

I've created a PE bed file with one line for each of the original amplicons and another for the 'super' amplicon.

This correctly clips the outer most primers, however for the super amplicons, the right alignments are clipped at both ends (I would only expect clipping at the right). This is shown in attached screenshot, the red lines show where the softclipping starts. The tracks at the bottom show where the primers are and the lines in the paired end BED file.

image

The super amplicons provide coverage where the targets of the original amplicons do not overlap (apart from the primers) but this issue means the super amplicons are also being clipped in this gap so I am losing coverage.

This is true of every super amplicon. I've tried changing up and down to 0 and 1 to ensure reads aren't associated with a different amplicon and sorting the PE BED file so the super amplicons are first.

Is this expected behaviour? Any tips? Thanks

donutbrew commented 4 years ago

If these amplicons are from a multiplex PCR, it would be impossible to tell the difference from an amplicon that was generated from the outer primers only (and should not be middle trimmed) or a hybrid amplicon that is the result of two partially-overlapping precursor amplicons priming each other to become the longer amplicon.

My solution to this is to exclude longer-than-expected amplicons during the read QC step. Of course if you rely on these longer amplicons, you'll have to redesign the assay.