gymreklab / GangSTR

A tool for profiling long STRs from short reads
GNU General Public License v2.0
85 stars 16 forks source link

[GangSTR-2.4.6] ERROR: No read group specified in BAM file #99

Closed archana433 closed 3 years ago

archana433 commented 3 years ago

GangSTR --bam sample_sorted.bam --ref Homo_sapiens.GRCh38.dna.primary_assembly.fa --regions hg38ver12.bed --out Sample [GangSTR-2.4.6] ERROR: No read group specified in BAM file

Bam file - Samtools view sample_sorted.bam | head ERR000044.1 97 9 26830129 0 45M = 26830192 108 GAACAGTCATTGCCCAATTCCCAACAGCAGTTGGGGTGTCCTGTT IIIIIIIIIIIIIIIIIIIIIIHIIEIF<I0I9C?I;IH.<I0AI NM:i:0 MD:Z:45 AS:i:45 XS:i:45 ERR000044.1 145 9 26830192 60 45M = 26830129 -108 GTGAAACCAGCTGGTTTTCTGGGTCGAGCGGGGACTTGGAGAACT IIIIIIIIIIIBIIIII=IIIIIIIIIIIIIIIIIIIIIIIIIII NM:i:0 MD:Z:45 AS:i:45 XS:i:27

zcat sample_sorted.bam | head BAMg@SQ SN:1 LN:248956422 @SQ SN:10 LN:133797422 @SQ SN:11 LN:135086622 @SQ SN:12 LN:133275309 @SQ SN:13 LN:114364328 @SQ SN:14 LN:107043718 @SQ SN:15 LN:101991189 @SQ SN:16 LN:90338345 @SQ SN:17 LN:83257441 @SQ SN:18 LN:80373285

nmmsv commented 3 years ago

The read group tag is required by GangSTR to be present in the bam file. There are several methods for adding the RG tag to alignments: https://www.biostars.org/p/47487/ This is an example of an alignment from the 1000genomes dataset that has the RG tag:

A00296:30:HFN2VDSXX:2:2669:7021:7983    83  chr1    9996    40  58S92M  =   10022   -66 CCCTCACCCGACCCCTAACGCGCTCCCGAACCCTCACTCTACCTCTCACCAGTTCACTTCCGATAACCCTAACCCTACCCCTAACCCTATCCCTAACGCTTACCGTATACCTAACGCTAACCCTAACCCTAACCCTAACCCTAACCCTAA  ?+?+++???+++++?'++??$++??+5+++?+??+??++'?+++??+??+5+++?+???+?++?++55???++?+??+?5?????5???+??+'???+?????$+'?+++??+??5?+?+?+??+??+??????????????????????  XA:Z:chr4,-10115,65S85M,8;chr4,+190122850,41M5I41M63S,10;chr7_KI270899v1_alt,-2,63S40M6I41M,11; MC:Z:86M64S PG:Z:MarkDuplicates MQ:i:40 AS:i:52 XS:i:48 MD:Z:0N0N0N0N0N14A11A7C2A3C2A0C6C34NM:i:13  RG:Z:HG01104_AGTTCAGG-CCAACAGA_HFN2VDSXX_L002