tommyau / bamclipper

Remove primer sequence from BAM alignments by soft-clipping
MIT License
31 stars 10 forks source link

chr vs no chr #2

Closed rjsicko closed 7 years ago

rjsicko commented 7 years ago

Thanks for the script to convert a TruSeq manifest to bedpe file. I can confirm the script and bamclipper worked with my custom TruSeq files.

I did have to remove chr from the bedpe file as I aligned using GRCh37 instead of hg19. I initially didn't catch the chr vs no chr issue and bamclipper ran and output a "clipped" bam file, but the primers weren't clipped. I suggest error checking that the supplied primer file and the aligned bam both use chr or don't. Or internally harmonizing chr vs no chr in bamclipper.

While I was debugging when my primers weren't clipped, I noticed my manifest file has spaces and '+' in it. I initially thought this might be an issue so I modified 'manifest2bedpe.pl' by adding

my $target_key_clean = $target_key =~ s/\s/_/gr;
$target_key_clean =~ s/\+/_/g;

in the conversion for loop.

Thanks again for the program!

tommyau commented 7 years ago

Mismatch in reference sequence names is a common scenario. For example, the manifest file mentioned chr1 but your BAM file mentioned 1 for chromosome 1. I just added a check in clipprimer.pl to compare those in SAM SQ header lines versus BEDPE: ea387ba. A warning will be given in your case and please help to try.

For manifest2bedpe.pl, I guess that the current version can properly parse and handle target names containing spaces and '+'. Could you provide any sample manifest lines of concern?

rjsicko commented 7 years ago

you're right manifest2bedpe.pl parsed my target names with spaces and '+' fine. I just wasn't sure if the spaces and '+' in the name field of the bedpe file was causing issues with clipprimer.pl (it wasn't). Thanks again.