Open apredeus opened 3 years ago
Hey, I would really appreciate if you could implement an option to remove only clipped bases. As well as apredeus, I crawled the web to find tools, which can do that. ngsutils dont work in my hands (HPC cluster). It only produces errors...
So, this function would be a real improvement!
I thought this could be a fun mini project and so what do you think about this subroutine? I think it is working at least in theory locally for me.
Add in the qual column constant
# SAM file TSV columns
use constant {
SAM_RNAME => 2,
SAM_POS => 3,
SAM_CIGAR => 5,
SAM_TLEN => 8,
SAM_SEQ => 9,
SAM_QUAL => 10,
};
In the main body right before filtering:
if($clip){
my $samArr = clip(\@sam, $HL, $SL, $SR, $HR);
@sam = @$samArr;
}
Defined later in the code:
sub clip{
my($samArr, $HL, $SL, $SR, $HR) = @_;
# Avoid modifying the original array
my @sam = @$samArr;
# Trim the sequence and qual according to soft clips
$sam[SAM_SEQ] = substr($sam[SAM_SEQ], $SL, length($sam[SAM_SEQ])-$SL-$SR);
$sam[SAM_QUAL] = substr($sam[SAM_QUAL], $SL, length($sam[SAM_QUAL])-$SL-$SR);
# What to do with hard clips? Just remove them from cigar string?
return \@sam;
}
And some getopts changes too.
I'll put some more changes that I just made into a PR so that it's visible but it is not tested
I was wondering if it's worth adding the option to not remove the whole read with soft/hardclipping, but rather just trim the clipped reads? There's one obscure and no longer supported tool (ngsutils) that supports this option, but that's all I was able to find.