tseemann / samclip

Filter SAM file for soft and hard clipped alignments
GNU General Public License v3.0
46 stars 10 forks source link

Remove clipped bases? #15

Open apredeus opened 3 years ago

apredeus commented 3 years ago

I was wondering if it's worth adding the option to not remove the whole read with soft/hardclipping, but rather just trim the clipped reads? There's one obscure and no longer supported tool (ngsutils) that supports this option, but that's all I was able to find.

steinbrl commented 5 months ago

Hey, I would really appreciate if you could implement an option to remove only clipped bases. As well as apredeus, I crawled the web to find tools, which can do that. ngsutils dont work in my hands (HPC cluster). It only produces errors...

So, this function would be a real improvement!

lskatz commented 3 months ago

I thought this could be a fun mini project and so what do you think about this subroutine? I think it is working at least in theory locally for me.

Add in the qual column constant

# SAM file TSV columns
use constant {
  SAM_RNAME => 2,
  SAM_POS   => 3,
  SAM_CIGAR => 5,
  SAM_TLEN  => 8,
  SAM_SEQ   => 9,
  SAM_QUAL  => 10,
};

In the main body right before filtering:

    if($clip){
      my $samArr = clip(\@sam, $HL, $SL, $SR, $HR);
      @sam = @$samArr;
    }

Defined later in the code:

sub clip{
  my($samArr, $HL, $SL, $SR, $HR) = @_;

  # Avoid modifying the original array
  my @sam = @$samArr;

  # Trim the sequence and qual according to soft clips
  $sam[SAM_SEQ]  = substr($sam[SAM_SEQ],  $SL, length($sam[SAM_SEQ])-$SL-$SR);
  $sam[SAM_QUAL] = substr($sam[SAM_QUAL], $SL, length($sam[SAM_QUAL])-$SL-$SR);

  # What to do with hard clips? Just remove them from cigar string?

  return  \@sam;
}

And some getopts changes too.

lskatz commented 3 months ago

I'll put some more changes that I just made into a PR so that it's visible but it is not tested