Closed archana433 closed 3 years ago
The read group tag is required by GangSTR to be present in the bam file. There are several methods for adding the RG tag to alignments: https://www.biostars.org/p/47487/ This is an example of an alignment from the 1000genomes dataset that has the RG tag:
A00296:30:HFN2VDSXX:2:2669:7021:7983 83 chr1 9996 40 58S92M = 10022 -66 CCCTCACCCGACCCCTAACGCGCTCCCGAACCCTCACTCTACCTCTCACCAGTTCACTTCCGATAACCCTAACCCTACCCCTAACCCTATCCCTAACGCTTACCGTATACCTAACGCTAACCCTAACCCTAACCCTAACCCTAACCCTAA ?+?+++???+++++?'++??$++??+5+++?+??+??++'?+++??+??+5+++?+???+?++?++55???++?+??+?5?????5???+??+'???+?????$+'?+++??+??5?+?+?+??+??+?????????????????????? XA:Z:chr4,-10115,65S85M,8;chr4,+190122850,41M5I41M63S,10;chr7_KI270899v1_alt,-2,63S40M6I41M,11; MC:Z:86M64S PG:Z:MarkDuplicates MQ:i:40 AS:i:52 XS:i:48 MD:Z:0N0N0N0N0N14A11A7C2A3C2A0C6C34NM:i:13 RG:Z:HG01104_AGTTCAGG-CCAACAGA_HFN2VDSXX_L002
GangSTR --bam sample_sorted.bam --ref Homo_sapiens.GRCh38.dna.primary_assembly.fa --regions hg38ver12.bed --out Sample [GangSTR-2.4.6] ERROR: No read group specified in BAM file
Bam file - Samtools view sample_sorted.bam | head ERR000044.1 97 9 26830129 0 45M = 26830192 108 GAACAGTCATTGCCCAATTCCCAACAGCAGTTGGGGTGTCCTGTT IIIIIIIIIIIIIIIIIIIIIIHIIEIF<I0I9C?I;IH.<I0AI NM:i:0 MD:Z:45 AS:i:45 XS:i:45 ERR000044.1 145 9 26830192 60 45M = 26830129 -108 GTGAAACCAGCTGGTTTTCTGGGTCGAGCGGGGACTTGGAGAACT IIIIIIIIIIIBIIIII=IIIIIIIIIIIIIIIIIIIIIIIIIII NM:i:0 MD:Z:45 AS:i:45 XS:i:27
zcat sample_sorted.bam | head BAMg@SQ SN:1 LN:248956422 @SQ SN:10 LN:133797422 @SQ SN:11 LN:135086622 @SQ SN:12 LN:133275309 @SQ SN:13 LN:114364328 @SQ SN:14 LN:107043718 @SQ SN:15 LN:101991189 @SQ SN:16 LN:90338345 @SQ SN:17 LN:83257441 @SQ SN:18 LN:80373285