Open RemiMaglione opened 4 years ago
Note: I also found that using samtools in multi-threading mode didn't work with files generated with ngmlr. It still send me the incomplete aux field/truncated error even if it worked perfectly in a single thread. I don't know if it's a personal config issue, didn't find a solution to fix this so far, just though it could help.
Reads with bad quality should be marked rather than removing them directly. Here's my simple perl script MarkBedQUAL.pl
:
#!/usr/bin/env perl
while(<>){
if(m/^\@/){print $_;next;}
@a = split/\t/;
if($a[4] <0){
$a[4] = 0;
unless($a[1] & 512){
$a[1] += 512;
}
}
$_=join("\t",@a);
print $_;
}
Then combinengmlr
and samtools
with pipe:
ngmlr -t 4 -x ont -r ref.fa -q ont.fastq.gz |
perl MarkBedQUAL.pl |
samtools view -bhS |
samtools sort > align.bam
Reads with bad quality can be filtered directly by flag:
samtools view -F 512 align.bam
Hello nglmr users/dev,
Looks like a few people ran into the
incomplete aux field/truncated file
samtools issue. Since I also ran into this problem, I put my investigator hat, dig on each "wrong/corrupt" lines and found that they all shared the same issue: the mapping quality was < 0 (e.g: "-2147483648"). Since it concern only a few sequences (my worse case scenario was 1 over 3.000), here's how I handle this :awk -F "\t" '{if ($5 >= 0 || substr ($1, 0, 1)=="@") print $0} myBad.sam' > myGood.sam
Basically, it keeps only lines where quality feilds ($5) are equal to 0 or more, and keep header lines (i.e: lines starting with @)
If you have multiple .sam files :
for i in *.sam ; do echo $i ; awk -F "\t" '{if ($5 > -1 || substr ($1, 0, 1)=="@") print $0}' $i > $(basename $i ".sam")"g.sam" ; done
Hope it could help, Cheers