philres / ngmlr

NGMLR is a long-read mapper designed to align PacBio or Oxford Nanopore (standard and ultra-long) to a reference genome with a focus on reads that span structural variations
MIT License
293 stars 40 forks source link

[A SOLUTION TO] => samtools issue : incomplete aux field/truncated file #89

Open RemiMaglione opened 4 years ago

RemiMaglione commented 4 years ago

Hello nglmr users/dev,

Looks like a few people ran into the incomplete aux field/truncated file samtools issue. Since I also ran into this problem, I put my investigator hat, dig on each "wrong/corrupt" lines and found that they all shared the same issue: the mapping quality was < 0 (e.g: "-2147483648"). Since it concern only a few sequences (my worse case scenario was 1 over 3.000), here's how I handle this :

awk -F "\t" '{if ($5 >= 0 || substr ($1, 0, 1)=="@") print $0} myBad.sam' > myGood.sam

Basically, it keeps only lines where quality feilds ($5) are equal to 0 or more, and keep header lines (i.e: lines starting with @)

If you have multiple .sam files : for i in *.sam ; do echo $i ; awk -F "\t" '{if ($5 > -1 || substr ($1, 0, 1)=="@") print $0}' $i > $(basename $i ".sam")"g.sam" ; done

Hope it could help, Cheers

RemiMaglione commented 4 years ago

Note: I also found that using samtools in multi-threading mode didn't work with files generated with ngmlr. It still send me the incomplete aux field/truncated error even if it worked perfectly in a single thread. I don't know if it's a personal config issue, didn't find a solution to fix this so far, just though it could help.

wangshun1121 commented 1 year ago

Reads with bad quality should be marked rather than removing them directly. Here's my simple perl script MarkBedQUAL.pl :

#!/usr/bin/env perl
while(<>){
        if(m/^\@/){print $_;next;}
        @a = split/\t/;
        if($a[4] <0){
                $a[4] = 0;
                unless($a[1] & 512){
                        $a[1] += 512;
                }
        }
        $_=join("\t",@a);
        print $_;
} 

Then combinengmlr and samtools with pipe:

ngmlr -t 4 -x ont -r ref.fa -q ont.fastq.gz | 
        perl  MarkBedQUAL.pl |
        samtools view -bhS |
        samtools sort > align.bam

Reads with bad quality can be filtered directly by flag:

samtools view -F 512 align.bam