philres / ngmlr

NGMLR is a long-read mapper designed to align PacBio or Oxford Nanopore (standard and ultra-long) to a reference genome with a focus on reads that span structural variations
MIT License
284 stars 41 forks source link

[A SOLUTION TO] => samtools issue : incomplete aux field/truncated file #89

Open RemiMaglione opened 3 years ago

RemiMaglione commented 3 years ago

Hello nglmr users/dev,

Looks like a few people ran into the incomplete aux field/truncated file samtools issue. Since I also ran into this problem, I put my investigator hat, dig on each "wrong/corrupt" lines and found that they all shared the same issue: the mapping quality was < 0 (e.g: "-2147483648"). Since it concern only a few sequences (my worse case scenario was 1 over 3.000), here's how I handle this :

awk -F "\t" '{if ($5 >= 0 || substr ($1, 0, 1)=="@") print $0} myBad.sam' > myGood.sam

Basically, it keeps only lines where quality feilds ($5) are equal to 0 or more, and keep header lines (i.e: lines starting with @)

If you have multiple .sam files : for i in *.sam ; do echo $i ; awk -F "\t" '{if ($5 > -1 || substr ($1, 0, 1)=="@") print $0}' $i > $(basename $i ".sam")"g.sam" ; done

Hope it could help, Cheers

RemiMaglione commented 3 years ago

Note: I also found that using samtools in multi-threading mode didn't work with files generated with ngmlr. It still send me the incomplete aux field/truncated error even if it worked perfectly in a single thread. I don't know if it's a personal config issue, didn't find a solution to fix this so far, just though it could help.

wangshun1121 commented 10 months ago

Reads with bad quality should be marked rather than removing them directly. Here's my simple perl script MarkBedQUAL.pl :

#!/usr/bin/env perl
while(<>){
        if(m/^\@/){print $_;next;}
        @a = split/\t/;
        if($a[4] <0){
                $a[4] = 0;
                unless($a[1] & 512){
                        $a[1] += 512;
                }
        }
        $_=join("\t",@a);
        print $_;
} 

Then combinengmlr and samtools with pipe:

ngmlr -t 4 -x ont -r ref.fa -q ont.fastq.gz | 
        perl  MarkBedQUAL.pl |
        samtools view -bhS |
        samtools sort > align.bam

Reads with bad quality can be filtered directly by flag:

samtools view -F 512 align.bam