ncbi / magicblast

34 stars 16 forks source link

Filtering MagicBLAST Ouput to get Unmapped Pairs #43

Closed bioinfoMMS closed 1 year ago

bioinfoMMS commented 2 years ago

Hello, I am trying to get unmapped read pairs (neither read of a pair maps to the reference genome) from MagicBLAST's BAM output. My MagicBLAST command is:

magicblast -splice F -num_threads 4 -outfmt sam -infmt fastq -paired -perc_identity 0.95 -db myreference.fa -query read1.fq -query_mate read2.fq -no_query_id_trim -lcase_masking -out magicblast_results.sam

I get the following flag stats from my output file:

237122755 + 0 in total (QC-passed reads + QC-failed reads) 60851593 + 0 secondary 0 + 0 supplementary 0 + 0 duplicates 216503969 + 0 mapped (91.30% : N/A) 176271162 + 0 paired in sequencing 88135581 + 0 read1 88135581 + 0 read2 104186122 + 0 properly paired (59.11% : N/A) 137802636 + 0 with itself and mate mapped 17849740 + 0 singletons (10.13% : N/A) 2130570 + 0 with mate mapped to a different chr 2130570 + 0 with mate mapped to a different chr (mapQ>=5)

I then run 'samtools view -f12' on the bam file to get a filtered bam containing reads that are unmapped (flag 0x4) with mates that are also unmapped (flag 0x8). The flag stats from the filtered bam are as follows:

11052838 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 secondary 0 + 0 supplementary 0 + 0 duplicates 0 + 0 mapped (0.00% : N/A) 11052838 + 0 paired in sequencing 5713671 + 0 read1 5339167 + 0 read2 0 + 0 properly paired (0.00% : N/A) 0 + 0 with itself and mate mapped 0 + 0 singletons (0.00% : N/A) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)

My question is why do I not get the same number for read1 and read2? I have done this same process for a Bowtie2 bam file and I get the same number of read1 and read2 in that case.

I found this Biostars post and was wondering if it could be related: https://www.biostars.org/p/383843/

Thanks in advance!

boratyng commented 2 years ago

Hello @bioinfoMMS,

Thank you for using Magic-BLAST. It looks like there is a bug. In cases where a read is unaligned and its mate aligns, Magic-BLAST sometimes incorrectly sets mate unaligned bit (8) for the unaligned read. We will fix it in the next release. I will get back to you if I we find any way that you can post-process samtools results before the release.

bioinfoMMS commented 2 years ago

Thanks for the response! Do you know when the next release will be?

boratyng commented 2 years ago

We are planning for a release in July. I apologize for the delayed response.

JohnUrban commented 1 year ago

Any news on this release?

boratyng commented 1 year ago

We are planning a release by the end of September. We are running a bit behind. Sorry.

boratyng commented 1 year ago

Hi @bioinfoMMS and @JohnUrban, we hit another snag. The release should go out next week. Sorry about the delay.

JohnUrban commented 1 year ago

I look forward to seeing the updated version!

bioinfoMMS commented 1 year ago

Same here! Thanks for keeping us updated!

boratyng commented 1 year ago

Magic-BLAST 1.7.0 release is out, with a fix to your problem. You can download it from https://ftp.ncbi.nlm.nih.gov/blast/executables/magicblast/LATEST. I am going to update bioconda early next week.

boratyng commented 1 year ago

Magic-BLAST is also updated is Bioconda. This problem should be fixed now.