Closed christiananthon closed 4 years ago
Could you check whether the surrounding lines have different read ID? Sometimes, aligners report an uneven number of alignment lines for a read, and that messes up pairing of the lines.
The context of the lines are shown below
D00635:270:CBBRUANXX:2:2310:19291:56353 99 chr1 1424416 60 110M = 1424484 178 CCCCAACACGCATGGTGGCAGCAGCACACGTGTCCTGGGCTCCTGGTACTTCACAAACCAGGAAAGCTAGACTCTGAGTCACAGAATAAATACACTCAGCCGAGAGGGAC :30:CF=GG//FFGGGEG>BG>C>FGGGGGD@=FGD00DGDGE1=FEE:G11FGFGGGGGGG@F@BFGGGGGECGF@GG0FBFGGEDCFBCF@DFGG@C.9CB9CE AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:110 YS:i:0 YT:Z:CP XS:A:- NH:i:1
D00635:270:CBBRUANXX:5:2215:8585:41857 147 chr1 1424462 60 110M = 1424390 -182 TACTTCACAAACCAGGAAAGCTAGACTCTGAGTCACAGAATAAATACACTCAGCCGAGAGGGACCGCTGTGCTCCTGGAGGTTCTGTCCTCGCGGCTGGACACACCTGCT GGGGGGGGEGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGGFGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCCCCB AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:110 YS:i:0 YT:Z:CP XS:A:- NH:i:1
D00635:270:CBBRUANXX:3:2206:7043:73916 99 chr1 1424476 60 110M = 1424585 219 GGAAAGCTAGACTCTGAGTCACAGAATAAATACACTCAGCCGAGAGGGACCGCTGTGCTCCTGGAGGTTCTGTCCTCGCGGCTGGACACACCTGCTCCTCTCTGGGGGGA 3@BBCBGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGGDGGGGGGGGGGGGGGGGGGGGGGGGGGGDGDGGFGGGGGDGGGG@GGGGGGGGDEDGDD AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:110 YS:i:0 YT:Z:CP XS:A:- NH:i:1
D00635:270:CBBRUANXX:5:2309:4106:8108 99 chr1 1424482 60 110M = 1424573 201 CTAGACTCTGAGTCACAGAATAAATACACTCAGCCGAGAGGGACCGCTGTGCTCCTGGAGGTTCTGTCCTCGCGGCTGGACACACCTGCTCCTCTCTGGGGGGACCTCGA @B>@FGEFFCF;EGGGGGGGGGGGGGGGCGGCG>>9/9//EFDGGGFGEGEGC@:1BBDGG==:F@F/CADGG<<@C<FGGDGG@FGGGGCGGF0;6.C.CB.6CC AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:110 YS:i:-6 YT:Z:CP XS:A:- NH:i:1
D00635:270:CBBRUANXX:2:2310:19291:56353 147 chr1 1424484 60 110M = 1424416 -178 AGACTCTGAGTCACAGAATAAATACACTCAGCCGAGAGGGACCGCTGTGCTCCTGGAGGTTCTGTCCTCGCGGCTGGACACACCTGCTCCTCTCTGGGGGGACCTCGAAC D/6=EBEGGGGGC/GGGGG=GGDE/CGGGGFF/GGGGGGGGGC>DGGEFF>BCGGGGGGGGGGGGAGGGGGEC:GGGFBCEFBGGGGGF=BFGGAGGGDF@CCBBAAAA< AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:110 YS:i:0 YT:Z:CP XS:A:- NH:i:1
D00635:270:CBBRUANXX:2:2110:7470:28171 99 chr1 1424489 60 110M = 1424569 190 CTGAGTCACAGAATAAATACACTCAGCCGAGAGGGACCGCTGTGCTCCTGGAGGTTCTGTCCTCGCGGCTGGACACACCTGCTCCTCTCTGGGGGGACCTCGAACCTGGC :B@BBGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGDCCGGGGGGGGGGGB AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:110 YS:i:0 YT:Z:CP XS:A:- NH:i:1
D00635:270:CBBRUANXX:3:2306:12547:91173 147 chr1 1424489 60 110M = 1424370 -229 CTGAGTCACAGAATAAATACACTCAGCCGAGAGGGACCGCTGTGCTCCTGGAGGTTCTGTCCTCGCGGCTGGACACACCTGCTCCTCTCTGGGGGGACCTCGAACCTGGC GGGGGGGGGGGGGGGGEGGGDGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGBCBBB AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:110 YS:i:0 YT:Z:CP XS:A:- NH:i:1
D00635:270:CBBRUANXX:2:2211:18272:41224 99 chr1 1424498 60 110M = 1424584 196 AGAATAAATACACTCAGCCGAGAGGGACCGCTGTGCTCCTGGAGGTTCTGTCCTCGCGGCTGGACACACCTGCTCCTCTCTGGGGGGACCTCGAACCTGGCTGACCACCA A3BABGEGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGEGGGDGGAGGGGGGGGGGGGGEGGGGEGGD AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:110 YS:i:0 YT:Z:CP XS:A:- NH:i:1
D00635:270:CBBRUANXX:2:2112:20307:2034 99 chr1 1424514 60 1S109M = 1424605 202 NGCCGAGAGGGACCGCTGTGCTCCTGGAGGTTCTGTCCTCGCGGCTGGACACACCTGCTCCTCTCTGGGGGGACCTCGAACCTGGCTGACCACCATAATCACGCAGGGCC !3<AGG@EGGGGGGGGGGGGGGGGGDGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGDGGGGGGGGEBGGGEGGGEG/DGGGGGDGGGGG AS:i:-4 ZS:i:-14 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:96G12 YS:i:-10 YT:Z:CP XS:A:- NH:i:1
D00635:270:CBBRUANXX:5:1107:5679:43464 99 chr1 1424514 60 110M = 1424615 211 GCCGAGAGGGACCGCTGTGCTCCTGGAGGTTCTGTCCTCGCGGCTGGACACACCTGCTCCTCTCTGGGGGGCCCTCGAACCTGGCTGACCACCATAGTCACGCAGGGCCC =@:@BEDGG>/0/9CGGGGEG1<FG><B/<C1@1<:11<EA<CAD/CFGGGGD0FBDGC0FF@@FG>////CE/CDGC>DECGGGGGG/.C6@@/C//C<BDA;..CG AS:i:-3 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:71A38 YS:i:0 YT:Z:CP XS:A:- NH:i:1
D00635:270:CBBRUANXX:3:1102:1964:31010 99 chr1 1424517 60 110M = 1424517 -110 GAGAGGGACCGCTGTGCTCCTGGAGGTTCTGTCCTCGCGGCTGGACACACCTGCTCCTCTCTGGGGGGACCTCGAACCTGGCTGACCACCATAGTCACGCAGGGCCCATC 3:>@BCGGGGGGGGGGGGGGGGGGGGDGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGBGGGGGG=GGBGEGGGGGGGGGGGBD AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:110 YS:i:0 YT:Z:CP XS:A:- NH:i:1
D00635:270:CBBRUANXX:3:1102:1964:31010 147 chr1 1424517 60 110M = 1424517 -110 GAGAGGGACCGCTGTGCTCCTGGAGGTTCTGTCCTCGCGGCTGGACACACCTGCTCCTCTCTGGGGGGACCTCGAACCTGGCTGACCACCATAGTCACGCAGGGCCCATC GGGGEGAGGGGDBGGGGGGGGGGGGC0GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGBGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCBCBC AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:110 YS:i:0 YT:Z:CP XS:A:- NH:i:1
D00635:270:CBBRUANXX:3:1316:14100:87578 99 chr1 1424530 60 110M = 1424616 196 GTGCTCCTGGAGGTTCTGTCCTCGCGGCTGGACACACCTGCTCCTCTCTGGGGGGACCTCGAACCTAGCTGACCACCATAGTCACGCAGGGCCCATCGGACGGAATGGGG :@@>AC@FGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGGGGGGGGGGGDGGGGEGGGCDGDCGBDEGG AS:i:-5 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:66G43 YS:i:-1 YT:Z:CP XS:A:- NH:i:1
D00635:270:CBBRUANXX:2:2110:7470:28171 147 chr1 1424569 60 110M = 1424489 -190 GCTCCTCTCTGGGGGGACCTCGAACCTGGCTGACCACCATAGTCACGCAGGGCCCATCGGACGGAATGGGGGACACAGAGGACACCCGAAGTCGGAAGCTCCAGGAGAAC G>G>C.<D
Could you figure out a solution to this problem? Please post it here if possible. Thanks!
I currently have time to do this if you guys share a BAM file and a GTF file that show the issue on google drive or something. Thank you
The official repo for htseq has been moved to: https://github.com/htseq/htseq. Please reopen the issue there and attach a BAM and GTF file (e.g. shar a google drive link) - then I can take a look at the problem.
Closing this one.
When processing a position sorted bam I get the following warning
Warning: Mate records missing for 2734 records; first such record: <SAM_Alignment object: Paired-end read 'D00635:270:CBBRUANXX:3:1102:1964:31010' aligned to chr1:[1424516,1424626)/+>.
But the read and it's mate is actually right next to each other (lines 6991 and 6992 in the samtools output of chr1. My guess is that the algorithm expects the mates to have different start coordinates, but here they are actually identical (1424517) due to the sequenced fragment being short.
6691:D00635:270:CBBRUANXX:3:1102:1964:31010 99 chr1 1424517 60 110M = 1424517 -110 GAGAGGGACCGCTGTGCTCCTGGAGGTTCTGTCCTCGCGGCTGGACACACCTGCTCCTCTCTGGGGGGACCTCGAACCTGGCTGACCACCATAGTCACGCAGGGCCCATC 3:>@BCGGGGGGGGGGGGGGGGGGGGDGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGBGGGGGG=GGBGEGGGGGGGGGGGBD AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:110 YS:i:0 YT:Z:CP XS:A:- NH:i:1 6692:D00635:270:CBBRUANXX:3:1102:1964:31010 147 chr1 1424517 60 110M = 1424517 -110 GAGAGGGACCGCTGTGCTCCTGGAGGTTCTGTCCTCGCGGCTGGACACACCTGCTCCTCTCTGGGGGGACCTCGAACCTGGCTGACCACCATAGTCACGCAGGGCCCATC GGGGEGAGGGGDBGGGGGGGGGGGGC0GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGBGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCBCBC AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:110 YS:i:0 YT:Z:CP XS:A:- NH:i:1