ncbi / magicblast

34 stars 16 forks source link

missing scaffold names on sam header #57

Closed cst-rmrz closed 4 months ago

cst-rmrz commented 4 months ago

The scaffold names should look something like this: @HD VN:1.0 SO:unsorted @SQ SN:Scaffold_12_contigslength_78070536 LN:78070537 @SQ SN:Scaffold_21_contigslength_61731525 LN:61731525 @SQ SN:Scaffold_33_contigslength_114446166 LN:114446168 @SQ SN:Scaffold_42_contigslength_60228994 LN:60228995 @SQ SN:Scaffold_52_contigslength_88504823 LN:88504823 @SQ SN:Scaffold_62_contigslength_83591330 LN:83591330 @SQ SN:Scaffold_72_contigslength_80977755 LN:80977755 @SQ SN:Scaffold_82_contigslength_76899493 LN:76899493 @SQ SN:Scaffold_9__1_contigs__length_70652527 LN:70652528

But magicblast formats them like this: @HD VN:1.0 GO:query @SQ SN:0 LN:78070537 @SQ SN:1 LN:61731525 @SQ SN:2 LN:114446168 @SQ SN:3 LN:60228995 @SQ SN:4 LN:88504823 @SQ SN:5 LN:83591330 @SQ SN:6 LN:80977755 @SQ SN:7 LN:76899493 @SQ SN:8 LN:70652528

With the changed scaffold names, the downstream analyses are really impossible for my pipeline. Is there a setting I'm not realizing to get magicblast to report the original scaffold names in the header of the SAM file?

boratyng commented 4 months ago

Hi @cst-rmrz, when you create a BLAST database for your target with makeblastdb you need to add -parse_seqids flag,, for example:

makeblastdb -in myseqs.fa -out mydb -dbtype nucl -parse_seqids
cst-rmrz commented 4 months ago

That was exactly the problem. I thought this was a bug, but I guess I just didn't know how to make a blast database ¯_(ツ)_/¯