Open mirahan opened 6 years ago
Hi,
I cannot access the file, but I would assume the rest of the alignments for the read in question are marked as secondary or supplementary. bamcollate2 filters these out by default.
bamcollate2 currently does not handle cases well with secondary alignments. I would suggest you try the latest bamsort with sort order queryname_HI for this. This should collate alignment pairs including secondary alignments.
Hi,
I am trying to use bamcollate2 on a RNA-seq bam file aligned with map-splice. But, after processing with bamcollate2 I am missing reads that were mapped to multiple locations. can you let me know how I can get all the reads out from bamcollate? The file that I am using is the RNA-seq bam file found on the GDC legacy website with file id 9b1a94fa-d6e8-49c5-a552-2da0e0ffe893. The outputs from the file before and after collating are below. before there are three reads (two lines for one read which is a fusion alignment) and after bamcollate2 there are only two reads in the bam file.
[mirahan@hanlab-dell1 pre-mrna]$ samtools view bam/TCGA-CG-5720-01A.ver2.bam | grep HS2_251:8:1101:1049:197409 HS2_251:8:1101:1049:197409/2 115 chr2 133038633 255 21M54S = 230045563 -97006910 GTTCAACTGCTGTTCACATGGTCGCCCGTCCCTTCGGAACGGCGCTCGCCCATCTCTCAGGACCGACTGACCCAT @B@FFFEFDDBHHGGHHFGIIIIGA?CBF9EHIGH>GHHH?G8BGHIIIIJBBEEF=ACC@BB@@B@BB@BB# XF:Z:ATAC, ZF:Z:FUS_133038633_230045616(--) RG:Z:EXTERNAL_9ed6bfd4-c1c1-44ae-88d8-ff01224de6de_20141215_1108581 IH:i:2 YH:Z:1.1 HI:i:1 YI:i:1 NM:i:3 XS:A:+ HS2_251:8:1101:1049:197409/2 499 chr2 230045563 255 21S54M = 133038633 97006910 GTTCAACTGCTGTTCACATGGTCGCCCGTCCCTTCGGAACGGCGCTCGCCCATCTCTCAGGACCGACTGACCCAT @B@FFFEFDDBHHGGHHFGIIIIGA?CBF9EHIGH>GHHH?G8BGHIIIIJBBEEF=ACC@BB@@B@BB@BB# XF:Z:ATAC, ZF:Z:FUS_133038633_230045616(--) RG:Z:EXTERNAL_9ed6bfd4-c1c1-44ae-88d8-ff01224de6de_20141215_1108581 IH:i:2 YH:Z:1.2 HI:i:2 YI:i:1 NM:i:3 XS:A:+ HS2_251:8:1101:1049:197409/1 69 0 0 0 0 CNGGGGATCTGAACCCGACTCCCTTTCGATCGGCCGAGGGCAACGGAGGCCATCGCCCGTCCCTTCGGAACGGCG @#1=ADBDFHFHFGHIIIBHIIG9CGFF;?DABBDEGIEGHEHEEB?C/;?<C?7(38<7?@BCCCBBBBBBBBB RG:Z:EXTERNAL_9ed6bfd4-c1c1-44ae-88d8-ff01224de6de_20141215_1108581 IH:i:0 HI:i:0 [mirahan@hanlab-dell1 pre-mrna]$ samtools view TCGA-CG-5620-01A.collated.bam | grep HS2_251:8:1101:1049:197409 HS2_251:8:1101:1049:197409/1 69 0 0 0 0 CNGGGGATCTGAACCCGACTCCCTTTCGATCGGCCGAGGGCAACGGAGGCCATCGCCCGTCCCTTCGGAACGGCG @#1=ADBDFHFHFGHIIIBHIIG9CGFF;?DABBDEGIEGHEHEEB?C/;?<C?7(38<7?@BCCCBBBBBBBBB RG:Z:EXTERNAL_9ed6bfd4-c1c1-44ae-88d8-ff01224de6de_20141215_1108581 IH:i:0 HI:i:0 HS2_251:8:1101:1049:197409/2 115 chr2 133038633 255 21M54S = 230045563 -97006910 GTTCAACTGCTGTTCACATGGTCGCCCGTCCCTTCGGAACGGCGCTCGCCCATCTCTCAGGACCGACTGACCCAT @B@FFFEFDDBHHGGHHFGIIIIGA?CBF9EHIGH>GHHH?G8BGHIIIIJBBEEF=ACC@BB@@B@BB@BB# XF:Z:ATAC, ZF:Z:FUS_133038633_230045616(--) RG:Z:EXTERNAL_9ed6bfd4-c1c1-44ae-88d8-ff01224de6de_20141215_1108581 IH:i:2 YH:Z:1.1 HI:i:1 YI:i:1 NM:i:3 XS:A:+ [mirahan@hanlab-dell1 pre-mrna]$