phasegenomics / matlock

Simple tools for working with Hi-C data
GNU Affero General Public License v3.0
16 stars 1 forks source link

bam2 juicer issue #6

Open macmanes opened 5 years ago

macmanes commented 5 years ago
$HOME/matlock/bin/matlock bam2 juicer $HOME/mapping_pipeline/tmp/PEER_373L.bam $HOME/matlock_juicer
INFO: converting bam to juicer on /mnt/lustre/macmaneslab/macmanes/mapping_pipeline/tmp/PEER_373L.bam
INFO: detected bam filetype
INFO: reading file "/mnt/lustre/macmaneslab/macmanes/mapping_pipeline/tmp/PEER_373L.bam"
FATAL: something went wrong in process_pair

PEER_373L.bam is properly parsable by samtools, which makes me thing the issue us not in the bamfile itself.

shawnpg commented 5 years ago

Could you share a snippet of the BAM file (e.g., samtools view foo.bam | head -n 1000)?

My first thought is that the BAM may not be sorted by name (samtools sort -n). Default sort in samtools is by chromosome coordinate, but it is typical for Hi-C data to be sorted by read name (QNAME) so corresponding Hi-C reads are right next to each other in the BAM. I would have to check the data to be sure that's what's hitting this error, but in case this rings any bells you might try sorting by read name.

xinghua1001 commented 3 years ago

I meet the same problem. Have you found out any solutions?

shawnpg commented 3 years ago

Can you send a snippet of the BAM with a command similar to the above (e.g., samtools view foo.bam | head -n 100)? This error most commonly is caused when the BAM needs to be resorted by read name (samtools sort -n).

Karimi-81 commented 3 years ago

Hi there, Did you find out any solution for this issue. I used the following command to resort the bam file as you suggested above: samtools sort -n sample.unique.REduced.paired_only.bam > sample.unique.REduced.paired.bam /gpfs/fs0/scratch/y/ymiar/karimi81/allhic/matlock/bin/matlock bam2 juicer sample.unique.REduced.paired.bam out.links.txt

but I still face with the same issue: /bin/matlock bam2 juicer sample.unique.REduced.paired.bam out.links.txt INFO: converting bam to juicer on sample.unique.REduced.paired.bam INFO: detected bam filetype INFO: reading file "sample.unique.REduced.paired.bam" FATAL: something went wrong in process_pair

Thank you

Karimi-81 commented 3 years ago

some parts of output for samtools view sample.unique.REduced.paired.bam | head -n 100:

GWNJ-0901:774:GW2103153736th:1:1101:1631:66056 83 ptg000040l_1_25449999 21156739 60 150M = 21156742 -147 CNTCTGTAAAAGAATTTGAGAAATATAGAGACTCAGTATCATTTTGGAGTTGGCTTAAATAGAAATGCTTAGATACTCATATGGGTGATATTTTCCAGAAGATATTTGATGTCGTAAAAGTAGTTTTATGTAGACTGAAATGATTTTCCA F#F7J<7-FFJAF7--<7A7FFF<77--7-777<AA-F-JAFF7-FAA7---7F7-JFFAA7J<FAAAAA7-F-AA7--F<--FFF-FJA-<--JJJJF<-J<<-<<AA<7<--AJJJJJFF77<-<<-<-JFJFA-JJJJFF-A-<AAA NM:i:7 MD:Z:1T5G18G50G0G33G36T0 MC:Z:89S61M AS:i:122 XS:i:19 GWNJ-0901:774:GW2103153736th:1:1101:1631:66056 163 ptg000040l_1_25449999 21156742 60 89S61M = 21156739 147 GGNNTAGGTNTTCANCAGNNANTTTTTGGANAAATTNGNGNANANGNTNANGTNANCNTANTNANGNTNGNTNANGNANANGNNANCNNCTNTNAANGAANTTGAGAAATATGGAGNCNNANTNTCNTNTTGNAGTTNGCTTAAATAGAA -A##AFFJJ#JJ<<#--<##7#AFJFJJJF#AAFFJ#J#J#7#<#<#F#-#-F#J#<#FA#J#F#-#J#J#J#-#<#-#7#7##-#-##-<#F#-7#FFF#JJJFJF<7FFFFF<A#F##-#7#FF#J#AJF#-A7F#F--AA--FJJ-7 NM:i:13 MD:Z:2G1G2A3T15A1T0C1G1A2A1T3G4G12 MC:Z:150M AS:i:35 XS:i:0 GWNJ-0901:774:GW2103153736th:1:1101:1641:56651 83 ptg000126l_1875000_76996999 2848255 60 3S104M43S = 2848188 -171 CNCCATTCTGTAAACTAGAGGCAATGAGTTCCCCCAAAAATAAGCCTATAAAAAAAATAATATTCTTGCTTTAAATTAAAAACACAATAGCTAGAACTCTGAATATTAATCGATCAACAGTCAAGTGCTCTTCTGACTGAACCAGCCAGA -#-7--77--7FA-----7-7-<7--7---A--AJFA<--<A<<777A-FFJF<-F<-77-F<F<<<-A<<F77-<-JFJJJJJ-<F<<-F-<<FF--<<-F<-7-<-F-F-F-7FJAF-<-<--J7JJFF77-77FA7JFJ<JJFAA<A NM:i:7 MD:Z:18G3T13G4G9G5G13T32 MC:Z:1S148M1S AS:i:69 XS:i:21 SA:Z:ptg000239l,11310,-,111S39M,0,2; GWNJ-0901:774:GW2103153736th:1:1101:1651:45910 83 ptg000107l 2299689 60 104M46S = 2299649 -144 TNCAGAAGGATTAGAACAATATCTGTTTGTTCCTAGAACTCTATAGAATTAATTTGACTGACTTAGATACATGGTCGATAGTAAGCACAAGAATGACTCAGATCGATCAAAACATGAAGAATTGGCTTAATACCACAGCTATATTCAATA -#JFAA7--7-77-FAFF7-A-JFFF-7A7----7-F<-7-<--7--777-<77-AJA<-F<-7-7A-<---7--F<JAFJ<<-FAF-J<JFF<-<---F<-JF-J<-JJJJFF<<JF-JF-JF<-<AJFAJJAJJFAFAFAA-AAFAAA NM:i:10 MD:Z:1G30T5T3T3G17T5C12G1T10G7 MC:Z:1S130M19S AS:i:57 XS:i:20 SA:Z:ptg000010l,5947274,-,104S46M,24,0; GWNJ-0901:774:GW2103153736th:1:1101:1651:56106 65 ptg000098l_9172999_21697219 6074267 60 150M ptg000079l 3860637 0 TTAGATATCTATCTAATGCTATTATAAATAATAATAGTTCAGTGCTTGCTTGAGCACTTACTATGTGCCAACCACTGGTACTGGGTGTGTGTGTGTGTGTGTGTGTGTACACGCATGTGTGTTTATTTTATTCTCACTACAGTCTCATGA AA-AAFAJAA7FFJJJJJFJJJJ<F-<FJF<F<FJAJJJFFFJJFFJJFFJFFF<FJFJ<FJ<FFJJ<FFJJJJJAJJJ<<FJFJJJJJJJJJFJJJJJJJJJJJJJJ<7<7FF-<FJJJJJJJFJJJ-7A<FJJ<-A7--7FFF--7F- NM:i:2 MD:Z:0G127C21 MC:Z:132M18S AS:i:144 XS:i:29 GWNJ-0901:774:GW2103153736th:1:1101:1651:56106 129 ptg000079l 3860637 60 132M18S ptg000098l_9172999_21697219 6074267 0 TCCNCTTTCNAACCAGCAATATAATATTCCTTCTTTCTNTCTCTTTCTCCNCCNCNACCANCNCTGAATCTANAATCATATCTCCTTCNTTNANTCNCACNCTCCTGAGATTATACNAGNANTNTGNTNTTANTCTTNGTCGTTCGTTCT AAF#AAFJA#<<FJJF<<FJFJ<<F<FJ7-<J<JJJ<J#J<F<FJJFF7<#7A#<#JJFJ#J#JAF<<FFJJ#J<<-7A-FAJ<AFF7#-<#-#F7#-7<#-<A-<F-77<F7FA-#7A#7#A#FJ#F#JJ-#-7-7#-A-7-77)7A-A NM:i:21 MD:Z:3A5A28T11C2A1C4T1T9C15T0C1G1G2C3A15A2G1C1C2T1A3 MC:Z:150M AS:i:87 XS:i:19 GWNJ-0901:774:GW2103153736th:1:1101:1651:65634 2145 ptg000003l 12647151 59 115H35M = 12647172 127 ACGCTGTTATACTTCTCTGTGAAAAAAATAAACTC 7A7-AFF-7A7FFF-F<FFJ<-7FA---77<<--- NM:i:1 MD:Z:7C27 MC:Z:44H106M AS:i:30 XS:i:0 SA:Z:ptg000021l,10572095,-,37S113M,60,1; GWNJ-0901:774:GW2103153736th:1:1101:1651:65634 113 ptg000021l 10572095 60 37S113M ptg000003l 12647172 0GAGTTTATTTTTTTCACAGAGAAGTATAACAGCGTCATCGATCCCAGGACCCAGAGGCTATAACCCACTGAGACACCCAGGCGCCCCCAGATCTTCAGTTTTAACAGAATGAAATTTTCTTACGGGAGAAGAGATTAGAAAATTGTCTGA ---<<77---AF7-<JFF<F-FFF7A7-FFA-7A7-F-A7F-JJF7A7JAJJF777-JFA7JJFJJF--7F-JJJF7FA77A-JJJJJFFF<<<<FF7A<AAJJFJFJFA<JJJJJFFFFF<F--FJFJFAJ<JFAF7FJJJFJF-FAAA NM:i:1 MD:Z:112T0 MC:Z:44S106M AS:i:112 XS:i:31 SA:Z:ptg000003l,12647151,+,115S35M,59,1; GWNJ-0901:774:GW2103153736th:1:1101:1651:65634 177 ptg000003l 12647172 60 44S106M ptg000021l 10572095 0CCCTCTGCATCCNCGAANCAANCNCCNTNTCATNCTTCTCTCCCAAAAANATANACNCNGTNTTTCCTGTAATCATAAAGCTCTAACTGTCAAGAGTTTCCTCTGAATTGAATTGTTAAAATTACTAGTATGCAAATGGTNAAAACNATA )A7-7-)--)A7#---<#77A#7#77#-#-A7-#A777-7---7JJJJA#F7F#<<#7#--#7A--A<77JF-A<<F<--A<F<A<-<<AJF<<<7F<-FF7<-<F<---F<---<-JJA<--FF<F7<F<<JJFF---F#JJJFF#AAA NM:i:10 MD:Z:5A3A2T1A2A3T12G7G53C5A3 MC:Z:37S113M AS:i:77 XS:i:22 GWNJ-0901:774:GW2103153736th:1:1101:1651:66021 65 ptg000009l_29164999_40607194 11226687 60 68M1I68M13S ptg000085l_100999_73149634 55115450 0 TATCAAAGCTAAGCCCTCAAGGCTGGGGACACCGGCTAGCACCGTGGGGTCGCGCGGAGGTGACTGGGTGGTGGGGGGGGAGCGCGGCGCCCGTGTGGGTCAGCCGTGGGCTGCTTCCCAAGGCTGTGAGGACGGAGGGGGATGATCAGG AAAAAJAFFFAFJJFFAJFAFF--7<FFF-7A--7---7---7777FJJ--77A-<F7AJJJ-7JFFJJJF7AJJ<<-AJ-7AJ--77<-7--7-7FFFA----7A-AAA-7A-AF--7A--7-AFFJ-FFA--A<F--AF--77-F-7A NM:i:10 MD:Z:0G32C2C12C20G13A6T1C6C35 MC:Z:115M35S AS:i:88 XS:i:20 GWNJ-0901:774:GW2103153736th:1:1101:1651:66021 129 ptg000085l_100999_73149634 55115450 60 115M35S ptg000009l_29164999_40607194 11226687 0 TCCNCTGAGNACCCCCAGCTGAGTCCATAGAGTCACATGGTCCCTGTGATGGGCTTGGCTGTTCGCTGTCATGGCAGCTGACTTGCCTNCANANTCNGGANGCAAGACCCTGATCGNCCCGNTNCGNCNTCANAGCCNGGGGAAGCAGCC AA<#-AAAF#7<<-7-777FJAJJJJ-FAA-FJ-<-7<FJJ77-7FJJ-FFFJ-AFJFA-7<F<AAAFF----7-7AA<A-77-----#7-#7#7-#<F-#-77-<--77AF7A7-#---7#<#77#-#---#-7--#)-AJA7AF--7- NM:i:10 MD:Z:0A2T5T49C28C2A1A2G3G0T13 MC:Z:68M1I68M13S AS:i:90 XS:i:0 GWNJ-0901:774:GW2103153736th:1:1101:1651:66478 81 ptg000126l_77144999_100758675 1410429 60 86S64M ptg000011l_6203999_33571230 899592 0 CAGATTAAGACTCACTTATGATTTGCATCCCTCACTCATTTCATCTTGTTTCATTTTTCCCTCCTTACCCCTATAATCAAACTCACAATTAAGGTCATTTAAATTCCACTAAGGTTTTATAATACACTCTATTAACTATCATGTATCTTA F<7--7-7---JA-77-77---7F7---7F7-A7--------<7AF<7---A7<A7--AA--77-A7AA7--7--<-<-<-<-A<-A-A<7<-<7---<-<7AFAF-<-7F<---A-JJA<F<F-FF-FJJ-<JJF-FAFFAAAF7FA<A NM:i:2 MD:Z:22T40T0 MC:Z:15S37M98S AS:i:58 XS:i:19 SA:Z:ptg000126l_77144999_100758675,141497,-,74M76S,1,7; GWNJ-0901:774:GW2103153736th:1:1101:1651:69854 161 ptg000030l 73978365 60 67M83S = 89379850 15401574 TAANATATANCAGAAAAGCATATTAGCTCATAAGATTATACTACCATAAAGGGAATTAATGATTGATTGATAACAAGTAGGTAGAGAGNCAGGNAGNGAGNGAGGAGGAAGTAGGTNGCTTNGNTANGNGTGNGCCTNGTGGAGGGTTCT -<A#-7A-A#AJF7--<F7FF-F77<7<--<--F--F<F--<<7--<--7FFJ-<FJ--F-<FJF<F-A---<7--777-7F7<F<-<#--AF#-7#F7A#7AFJ7FFAFFF<AF-#---7#-#-7#7#<-7#A7-7#)7<---AF)7)7 NM:i:4 MD:Z:3A1C1C1A57 MC:Z:61S89M AS:i:57 XS:i:19 SA:Z:ptg000030l,89379844,+,72S39M39S,14,4; GWNJ-0901:774:GW2103153736th:1:1101:1661:44169 99 ptg000126l_77144999_100758675 5087800 60 150M = 5087884 163 TTCCCTGGGATGAAGCCCCAGGTTGGGCTTCCTGCTCAGTGGGGAGCCTGCTTCTCCCTCTCGCTCTGCCCCTCCCCCGCTTGTGCGCTTTTGTTGTCTTTCTCTGTTAAATAAATAAATGAAATAGCTTTTATTAAATCTTTTTAATCT AA-A7AFJJFJFA7A-77-AFFJJJJJ-FJ7A7<-AF-AJJJJJ7<-77A7FF-<A<-AA-7--77F-77-77-77--777FAFF7F77-7FAJJ--7<7A-A-7FA-7--<--7<7-77---7--7-AFJF-7-7-77--A-FJ--7-- NM:i:7 MD:Z:0A61C26C5C11C12A13C15 MC:Z:30S79M41S AS:i:119 XS:i:45 GWNJ-0901:774:GW2103153736th:1:1101:1661:44169 147 ptg000126l_77144999_100758675 5087884 60 30S79M41S = 5087800 -163 TCCCTCGCCCACNGCCCNACCNCNCCNTNAGCGNTCTTGTTCTCTTACTNTGANAANANAANAAATAAAATAGCTTTTAACAAATCTNTNTAATCTNTANTATTTTGTCGANCCCAAGCCAGGCCTGCGCTAAGGACAGTNAGGGTNAGA )A7-)-)A7A)7#7FA-#-FA#7#-7#-#-77-#-7--7--7-A7--A-#-A7#FA#-#JJ#JJA7FJFF-AA<--<-<-JJF<-7-#-#FJAAF<#-<#-<------77F#FFA77--F77777-7--<--<-<A<--<#A--A7#A<A NM:i:14 MD:Z:3C12T2C2T0C2A0T0A2T17T7T1T6T2C9 MC:Z:150M AS:i:39 XS:i:0 GWNJ-0901:774:GW2103153736th:1:1101:1661:47685 145 ptg000009l_29164999_40607194 840100 60 34S116M ptg000001l 44728442 0 ACTCCATAACAANAAAANACCNCNCANANCTAANCAACAACCCAGTGAGNCAANTCNANGTNTTATTATTATTGTTTTCATCTTCTGTTAGAATTANATNAAAAACCCAGGGGTTACCTAACTTGGTCAAAGTCTCACAGNTTATANATA 7--<A7-JJFFF#F7--#<7-#-#--#-#7---#AF-JFAJAF<---A7#JFA#77#-#-F#FFA7AFFAF<JJFA<FF<-FAAA-F-7F<J<--J#F-#FF-AFJJJFFA<-<7JJJAJF<7F<<<JAF<F<F<JJJFF#AAJAA#AAA NM:i:12 MD:Z:15G3G2A0T0T2A34A2G2G37T5A2G0 MC:Z:150M AS:i:87 XS:i:20 GWNJ-0901:774:GW2103153736th:1:1101:1661:55596 99 ptg000003l 8514130 60 96M54S = 8514130 88 TGTTTGCTGGACGCCTGGTCCAGGGTAGCTGTCAATTTCTTTTTTTTTTTTTTTTTAAAGATTTTTATTTTTTTATTGGGCAGGGAGAGAGAGATCGTTCTCATTAGTTTACAATGATGTATTGTCTCTTATACACATCTCCGGGCCCAT AAAAFFJFJJJ--AA7FJJ7FAAJAJ---<FJ7<FJ-F-FJJJJJJJJJJJJJJJJ7--<-F-<-A-<<A-<FJ<-<FF---7---7-FAJ<F777--A7-7A-7-7FFJ7-7--77777--7FF-A-7F-A-7-7-7-7-7--AA-))) NM:i:4 MD:Z:12C57A8A3A12 MC:Z:29S88M33S AS:i:76 XS:i:42 GWNJ-0901:774:GW2103153736th:1:1101:1661:55596 147 ptg000003l 8514130 60 29S88M33S = 8514130 -88 CGGCAGCGTCACNTGTCNAAANANGANANTGTTNGCTGGACCCCTAGCCNAGGNTANCTGTNAATTTCTTTTTATTTTTTTTTTTAAAGATTTTTATTTATTTATTGGACAGAGAGAACAAAATCGATCACAATAGTTTANACTGANGTA -))77)77-A7-#----#A-7#7#-<#A#A-77#-7-7-7-A77----A#7--#77#7<77#JA-F<F-777---7---A<FFFAJJJFJA7---<--<-JJFFFJFFJJF<F-F<<--<<-<-<7F-JJAJF<FF<<-J#JFA-A#A-A NM:i:8 MD:Z:4T11G1T1C3G2G4C11T43 MC:Z:96M54S AS:i:63 XS:i:50 GWNJ-0901:774:GW2103153736th:1:1101:1661:61960 99 ptg000079l 17565865 60 49S101M = 17565968 180 TAGGGGAGAAAGAATAATTTTCACCCTACGCTTCTGAGTTTTTAAGATCGATCCCCTCAGAGGCAATCTATTTTAATTAATCTCGTAAAGGCCAGCACAGATGAAGGTTGTTTTTGTGCTGCACTGCTCTGGCGGTTGCTTTGTATTTGC AA-F-7AF-J<F-FA-<FJJF7-AJ7A<-7-<<7FJ-FFFFJF<<<JJ--<<---77--7-<A<--<-77-<FF-7AFF-7<J7-A-<<-7-7-7-7A----7--<AFJJJJ-AJJ<A-77A7-7F7A-7-77A-A-77F---7------ NM:i:8 MD:Z:21A13A6A9A10A10G8A3A13 MC:Z:63S77M10S AS:i:61 XS:i:22 SA:Z:ptg000079l,18098678,+,49M101S,60,0; GWNJ-0901:774:GW2103153736th:1:1101:1661:61960 2147 ptg000079l 18098678 60 49M101H = 17565968 -532635 TAGGGGAGAAAGAATAATTTTCACCCTACGCTTCTGAGTTTTTAAGATC AA-F-7AF-J<F-FA-<FJJF7-AJ7A<-7-<<7FJ-FFFFJF<<<JJ- NM:i:0 MD:Z:49 MC:Z:63H77M10H AS:i:49 XS:i:20 SA:Z:ptg000079l,17565865,+,49S101M,60,8; GWNJ-0901:774:GW2103153736th:1:1101:1661:61960 147 ptg000079l 17565968 60 63S77M10S = 17565865 -180 GGACCGCACAGANCAAGNCTANTNCTNTGCACCNGCTCTCTGGATGTACNCCCNTATTTACGCAGTTTTTTTTTAAAGTTCGATTAAAAATCCAATAAAACAAACAAGATATAATGTATAATTATCACTAATCTCAGAGANAATTANTCA ))7-))------#-F7-#---#-#-7#7-7---#---A-FA77<-<-7-#---#-7----7----7----77---A77---<<--<A-JF7JF-FFFJJJJJJJJF<<-7<-F7--<JJF<7<F<FAFFF<-<-F<-<-<#-<AFA#--A NM:i:3 MD:Z:11T5T27T31 MC:Z:49S101M AS:i:62 XS:i:20 GWNJ-0901:774:GW2103153736th:1:1101:1661:63718 97 ptg000018l_1262000_1650999 37322 60 117M33S ptg000126l_77144999_100758675 8801990 0 TCTGTTGGAAGCTCCGGGCTGTGTCATCTGTGGGCAATGAGGAGTTGCCCTATTCACCTTCGTGAAGGGGTTTTTCATCTTTCTCTGGAGTCAGGCATGAGCCGTTGCCGGAGTGGGGGAACGCGGGTCTTTGGTGCGGGGCTCCCTGGA <AAFFJJJ-7F-F--7AFJJFJJJ-<<<FFA-7FF7--<-<F<7F<FF<FJ7-<-<7FF---<F-7<FFJ7JJJJA-<A7FFA-A<-7A<-7-AF-7AF-<-A--777--A77-FJJ-77----77-----7AAJJ-77A---77-A7F) NM:i:8 MD:Z:31T20A6C1C21C2C16T4A8 MC:Z:56S36M58S AS:i:77 XS:i:19 GWNJ-0901:774:GW2103153736th:1:1101:1661:65124 83 ptg000013l_1_11549999 142554 48 90S60M = 142554 -60 ATTGAATCACGTCTCGTGGGCTCGCAGATGTCTATAAGAGACAGTAAAACAAAGTACTGAGTACACGGTGGTGGGTGAGCCAGGCTAATCGATCTCAGGGTCCTGGGATGGATCCCCAGATCAGGCTCTCTGCTCGGCAGGGAGGCTGCA F----F-7-7--7-7<<---F<7--7-7A77-AFFA77F-F77<7-FF<-JF7-77--77--FA77-7-JFA-<77F77A-<<<<--A7JFA7J7FA7--FJJJFAFJJ<FF<JFJFFAF-JJAAFAFFF-7FJJA<F77<FA-AAAAAA NM:i:1 MD:Z:59T0 MC:Z:46S58M46S AS:i:59 XS:i:44 GWNJ-0901:774:GW2103153736th:1:1101:1671:17676 99 ptg000047l_1_46824999 26970988 60 66S84M = 26971056 217 TGATCAGCTTCAGTCACCAGCTGTTTAACTAAGAGCGAAGTCATGAGCTCACTAACGGTTGGGATCAGGAGTGCGTATTATTTCTTCTTTAAATATTTGGTGGAATTCCCCTTGGAAGTCATTCAGTCCTGTACTCTTGTTTGTTGGGNG AAAF-AFJJJFFFJFJF<FA7FFJJJ7F<<-<F7A<F-FFJF7FJ<JFJ7AF-FFJFJFJJJJFJ-AFJFJJJ<JAAFJAFJJ<FJ-A7F<<77<AFJJJJ7F-77A77<A7FJJ--<F777F7-7-7-7FA<7F7-7A-7FFF<FFF#J NM:i:2 MD:Z:60C21A1 MC:Z:149M1S AS:i:77 XS:i:38 SA:Z:ptg000058l_621999_26521970,909114,-,84S66M,60,2; GWNJ-0901:774:GW2103153736th:1:1101:1671:17676 2163 ptg000058l_621999_26521970 909114 60 84H66M ptg000047l_1_46824999 26971056 0 GATCCCAACCGTTAGTGAGCTCATGACTTCGCTCTTAGTTAAACAGCTGGTGACTGAAGCTGATCA -JFJJJJFJFJFF-FA7JFJ<JF7FJFF-F<A7F<-<<F7JJJFF7AF<FJFJFFFJJJFA-FAAA NM:i:2 MD:Z:13G51C0 MC:Z:149M1H AS:i:60 XS:i:0 SA:Z:ptg000047l_1_46824999,26970988,+,66S84M,60,2; GWNJ-0901:774:GW2103153736th:1:1101:1671:17676 147 ptg000047l_1_46824999 26971056 60 149M1S = 26970988 -217 TCTTGTTTGTTGNGAGANTTTNGNTTNCNGNNTNAATTTCCTTGCTGATNACCNATNTNTTNAGNTNTTNTNTTTNTNCCTGTTTNANTNTTGGNANTTNANANATTNCCANGAATGCATCCATTTCTNCCAGATTGCCTNATTTGNTGA <JJ-AAA<-F--#FJFA#J77#-#-7#7#-##<#A7--7JA<-FF<AF-#JA7#<7#-#77#<<#J#<7#<#-7-#-#FFJF77<#F#F#<<<<#<#<<#J#J#F<7#JFF#-FFFAJF-JJFFA-JF#JJAFFFFJJF<#FJJFF#<A- NM:i:33 MD:Z:12G4T3T1A2G1T1C0T1T15T3A2C1T2C2G1T2C1A3A1T7C1G1T4T1G2T1T1A3T3G16T11A5T2 MC:Z:66S84M AS:i:83 XS:i:36 GWNJ-0901:774:GW2103153736th:1:1101:1671:22001 99 ptg000065l_14694000_30403999 3519138 60 68S64M18S = 3519143 65 TCGTGTTGTTTGTCTAATGGTTTTGAGAATTTGGGCCTAATTCGGTGAACTTTAAAGGGTGTTTGATGGATCCTAGGATTATTTCTGTTTCTGTAAGTGACTAGCCCATGTTAAGGTGCAAATGGTGAGATCACTGTCTCTTATACACAT AAAFFFJF<AA7F-FFAFJFF-F7<7---7<7J<JFF--77F-7FJA7---<AF-<-F--7F---<--<<7-F7<77-A7<AJJFJJ<AF7A7A<7FFJ-7-7FA---7AFA-7<AJJ<---7AA<F7AAA-<FAFFJJ<JF-A-7---7 NM:i:0 MD:Z:64 MC:Z:90S60M AS:i:64 XS:i:19 SA:Z:ptg000065l_14694000_30403999,9170465,+,56M94S,60,2; GWNJ-0901:774:GW2103153736th:1:1101:1671:22001 2147 ptg000065l_14694000_30403999 9170465 60 56M94H = 3519143 -5651264 TCGTGTTGTTTGTCTAATGGTTTTGAGAATTTGGGCCTAATTCGGTGAACTTTAAA AAAFFFJF<AA7F-FFAFJFF-F7<7---7<7J<JFF--77F-7FJA7---<AF-< NM:i:2 MD:Z:26A10A18 MC:Z:90H60M AS:i:46 XS:i:19 SA:Z:ptg000065l_14694000_30403999,3519138,+,68S64M18S,60,0; GWNJ-0901:774:GW2103153736th:1:1101:1671:22001 147 ptg000065l_14694000_30403999 3519143 52 90S60M = 3519138 -65 AGGTGTAAAAGANACAGNCGTNTNGTNTNTNTANTGGACTTGAAAATTTNGGCNAANTNCGNTGAACTTNANATGNTNTAGGATCNANCNTAGGNTNATNTNTGTTTCTCTNAGTGACTAGCCCATGTTAAGGTGCAAATNGTGAGNTCA 7------7<7-7#7)7J#F7-#-#JF#A#-#-7#-7----7AJJF77-A#7-A#FA#A#FA#7-JFF-A#A#7-A#-#--<-<<J#J#J#FFF7#<#<-#J#A<JFFFF--#F<<7FA77-JAFFF7F-AJJF<-JJF<A#-AAAA#-A- NM:i:9 MD:Z:4A1T2T1C7G1A28G5A2C0 MC:Z:68S64M18S AS:i:40 XS:i:0 GWNJ-0901:774:GW2103153736th:1:1101:1671:52133 81 ptg000034l 34824674 60 92S58M ptg000081l_1_2674999 2291821 0 TGGGCGCGGAGATGTGAAGAAGAGACAGTAAACTACTAGGAAGGCCAAATATTAAAAGTCTTAAAACAGAAAATAGCTAATACTTGATACGAATCATGATGAAGATAAAGGTTAGTTTTTCTTTCCTTCCTGCTGGAGGTGGAGGTTCTA ))A-A-77---7-----7-JFFA<F-A7-<7--<-7----F77-7AFA7-<-7-JF<-7<-7-FFA7--JA-F7<A7<<--7F7A-JFFF--A-F<<---AJ<J<7AF<-<-F<FF-<--F<--A-<<AF7-7-F-7AFFJJFFF7-AAA NM:i:3 MD:Z:10T6T39C0 MC:Z:5S59M86S AS:i:47 XS:i:21 SA:Z:ptg000081l_1_2674999,2291823,-,35S56M59S,21,5; GWNJ-0901:774:GW2103153736th:1:1101:1671:52133 161 ptg000081l_1_2674999 2291821 43 5S59M86S ptg000034l 34824674 0 TAANGTCCTNGGAAGTCCGAATGTTTAAAGTCTTTAAACTGTAGATCGCTAATACTTGATACGGGTGATGGTGATGATTAATGTTAGTTTTTCTTTNTTTNCTGCTGGAGGTGGAGNTTCTACTGTNTNTTANTTGANTCTGATGCTGTC <A-#-A7-A#F-7-<FA<---<-FFJ-7<<F-FFA7-<-<<-7F-F-<-<7-<--7-F-<77<F---7<-7AJ--A-7--77FFF<FFFAAJ<FFJ#-<F#-7F-FJJ-A7-AA-7#-AAF7-A7-#-#7A-#----#---7--7777-) NM:i:5 MD:Z:4A8A3A18A4A17 MC:Z:92S58M AS:i:37 XS:i:19 SA:Z:ptg000034l,34824681,+,71S50M29S,38,5; GWNJ-0901:774:GW2103153736th:1:1101:1671:52520 97 ptg000007l_184999_71823114 14380427 60 150M = 10725400 -3654950 TGGTGGGAAAGTGACTGAACTTAGAAAACAACGCATCTGTATAGAACTTCTGAAGCTGTATTTGCGGAAACACAATTGGTACAGCCTTGGATTCCTTCAGCGTTTTCTATAGCTATTAAAGCACTATCTCATCAGTGGTCTCATTCACCT AAAFFAA-F-FJJ7AFJFJFFJ<FFJFF-FJ-7F-F-<FJ<F7F<F-FF<FF<7F-<FJ7FJJJ-FJ77<AJ<FJ<FJJJJAJJFA7FJJ7AF--77-777-7AAJ7F7A7FAF-<F<77AFAAA7AAFJF<FFJJJJA<<F-FF<---7 NM:i:2 MD:Z:0A100C48 MC:Z:71S79M AS:i:144 XS:i:19 GWNJ-0901:774:GW2103153736th:1:1101:1671:52520 145 ptg000007l_184999_71823114 10725400 60 71S79M = 14380427 3654950 CTACAACTATTANAGCANCATNTNAACAGTGGTNCCATTCACCTCCCAGNAGCNTTGTGATTTCTAGAAGATCACTTGGAATATTATTTGGCATTCACTGATTTCACTAACGATAAATAAAATGTGAAGATATGAATATCNTAGCTNCTA --7-7-7-7---#F777#-7-#-#7-A7-77--#7<7--FFJF7JFF7-#AAA#77-777F7A77--AF-77JA<F<--F<<<--<<-7-AF<7<FJF<<<7<-FJF-JF<-<-JJF<JFF<--7<JFFJ<AF<FJ<<-F#AA-7A#A<A NM:i:3 MD:Z:69C5A2C0 MC:Z:150M AS:i:74 XS:i:0 SA:Z:ptg000007l_184999_71823114,14380533,-,76M74S,43,12; GWNJ-0901:774:GW2103153736th:1:1101:1671:52520 2193 ptg000007l_184999_71823114 14380533 43 76M74H = 14380427 -182 CTACAACTATTANAGCANCATNTNAACAGTGGTNCCATTCACCTCCCAGNAGCNTTGTGATTTCTAGAAGATCACT --7-7-7-7---#F777#-7-#-#7-A7-77--#7<7--FFJF7JFF7-#AAA#77-777F7A77--AF-77JA<F NM:i:12 MD:Z:3T1G6A4C0T2C1C1T7C0T14T3C22 MC:Z:150M AS:i:41 XS:i:0 SA:Z:ptg000007l_184999_71823114,10725400,-,71S79M,60,3; GWNJ-0901:774:GW2103153736th:1:1101:1671:53047 99 ptg000076l_1_8324999 2009072 60 12S32M4I102M = 2009224 302 TGATCCCTTGCTCTCTCTCTCTGCCTACCTCTCTGCTTACTTGTGATCGATCTCTGTCTGACAAATAAGTAAATAAAATCTTTAAAATACTCTTAATGTTATCAATATCATCGTTATCGTCATCACCAAGGGTGAGCATCTGAATGTACA AA-A-A-AAA-F-F-JFF<F<FJ7AA-7FF<F7F<7-<--<AJJJ7F<F-F7FFFFJ-<F7F-F-<<<AJAA<F<7FF<<FJF7<7A<777<FF7-<FJF7F<77F-F-7AAAFJFJ<AFA-A-7A<FF<<7FJ7A7<FFFA----77-- NM:i:4 MD:Z:134 MC:Z:150M AS:i:124 XS:i:49 GWNJ-0901:774:GW2103153736th:1:1101:1671:53047 147 ptg000076l_1_8324999 2009224 60 150M = 2009072 -302 TCAGCTGATTAANCTTANAAANTNGAATATGTANCCCCAATAAATAAGANTAGNCTCTTATGCAACCACAGTATAACTATCAAATTTAGAAAATATAACTTAAATATAATATTTTGTTATCAAATACATAGTCAATATGCNGATTTNGCA -7--77--7--7#---A#F7-#-#-7--777-F#AA-<-7<-JA-<A77#-<-#7---7<77FJJ<JA7<7-77FJF-A7-JA<---<-JJJF-F<JF<---7F<F<F<-FJF<-<--<-<JJJ<JF<-<--FA<<<-<<#AA---#-AA NM:i:19 MD:Z:2T4T2T1T4A3C1C2G6T4T2G6G0A3T47T0G37A5C2C0 MC:Z:12S32M4I102M AS:i:90 XS:i:23

baishengjun commented 3 years ago

Hi, Have you solved this problem? I meet the same error.

shawnpg commented 3 years ago

Hi,

@xinghua1001 @Karimi-81 @baishengjun unfortunately we were not able to reproduce the error using the snippet above, so we aren't sure what the problem is yet. Can one of you send the BAM file you are using that is generating this error? If it is large, I can send you either a Google drive folder or instructions for uploading to an FTP server instead.

Thanks,

Shawn

baishengjun commented 3 years ago

sorted.sam.zip Hi, It is part of my sam file, and it has been sorted and include the sam header.

jxlhn323 commented 3 years ago

Hi,

@xinghua1001 @Karimi-81 @baishengjun unfortunately we were not able to reproduce the error using the snippet above, so we aren't sure what the problem is yet. Can one of you send the BAM file you are using that is generating this error? If it is large, I can send you either a Google drive folder or instructions for uploading to an FTP server instead.

Thanks,

Shawn

Hi, have you resolved the problem? I meet the same problem. Thanks

sandipmkale commented 2 years ago

Hello,

have you resolved the problem? I meet the same problem.

With best regards

Sandip

xinghua1001 commented 2 years ago

Sorry, I still have no idea about that.

------------------ Original ------------------ From: @.>; Date: 2021年11月29日(星期一) 晚上11:23 To: @.>; Cc: "Xinghua @.>; @.>; Subject: Re: [phasegenomics/matlock] bam2 juicer issue (#6)

Hello,

have you resolved the problem? I meet the same problem.

With best regards

Sandip

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

sandipmkale commented 2 years ago

No problem. Can you please explain the format of link file? Do you know any other way to import bam into Juicebox? Thanks and regards

Sandip

yuzhenpeng commented 2 years ago

I met the same problem. Some one could help us?

gitcruz commented 2 years ago

Hi all, I am finding the same problem and this time i used a bam sortered by read_id.

INFO: converting bam to juicer on mapped.PT.name_sorted.bam INFO: detected bam filetype INFO: reading file "mapped.PT.name_sorted.bam" FATAL: something went wrong in process_pair

Have all the phasegeomics scripts in place and I had to installed matlock 5.2.5 with conda environment (because couldn't do it in our cluster without error). Just in case this helps, I list here all the programs and versions in the conda environment:

Name Version Build Channel

_libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 1_gnu conda-forge bzip2 1.0.8 h516909a_3 conda-forge gsl 2.6 h294904e_0 conda-forge libblas 3.8.0 17_openblas conda-forge libcblas 3.8.0 17_openblas conda-forge libgcc-ng 9.3.0 h24d8f2e_16 conda-forge libgfortran-ng 7.5.0 hdf63c60_16 conda-forge libgomp 9.3.0 h24d8f2e_16 conda-forge libopenblas 0.3.10 pthreads_hb3c22a3_4 conda-forge matlock 20181227 hc7800f0_1 bioconda xz 5.2.5 h516909a_1 conda-forge zlib 1.2.11 h516909a_1009 conda-forge

It will be useful to be able to overcome this program and get juicebox compatible files from any assembly.

Thanks Fernando

ChuShin commented 2 years ago

Hello,

I ran into the same error message. In my case (attached top few lines of bam below), the problem seems to be matlock doesn't like the trailing /1 and /2 in the bam file. Removing the "/1" "/2" in the read names solved the problem for me. Hope this could be useful.

Regards, ChuShin

-- GWNJ-0957:266:GW1808031300:6:1101:1205:32056/1 99 Scaffold_26344;HRSCAF=26510 10281631 42 121M = 10408124 0 NTAATAGACATATTGAAAGCATGGTTGGTTGGTCGGTATTGACCAGTTGCATTTGAGAGTGAAAAGTTTGTAACCAAACGTCATATTCCCCAAGGCAAGTCACGTTTAACACGAAACATGT #AAAFJJJJFJJJJJAJAFFJFFJFJJJJJJJJFJJJFFJJJF-JJJJJ<AJJJJJJJJAFJ<7FFJ<JJ7JJ<FAJ<JJJA<AAF<JFFJJJFJJJ-FJ<<JFJJJJJJJJJJJFJ<AFF AS:i:-1 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:0A120 YT:Z:UU RG:Z:BML GWNJ-0957:266:GW1808031300:6:1101:1205:32056/2 147 Scaffold_26344;HRSCAF=26510 10408124 7 150M = 10281631 0 GATTGTTCCTTCTCTACTACAACAATAGTTTTTTCTATGCCTCCATTATCGCTCTCACTCTGCATTTATTTTCAATAAAAAAATACATTAATAACAGAGTGTGCAAAGTAAAACTAACAAAATTCATATATTACAAAAACCANAAGTTCA JJJAA-77-JFAAF77-<7FJJJJFF<FJJFFA<AF-AJFAFJJJJFFFJJFFFFFJAAAJJJJFJJJJJFJJJJJJJJJJJJJJJJJJJJAJJJJJJJJJJJJJ<FFJJJJJJJJJJJJJJJFJJJJFJAAJJJFJJJJFJ#FJFFAAA AS:i:-4 XS:i:-9 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:8T133A7 YT:Z:UU RG:Z:BMG GWNJ-0957:266:GW1808031300:6:1101:1205:32408/1 99 Scaffold_173;HRSCAF=178 12924813 42 150M = 12924933 0 NGCCTGAAGGATAAAAGTCATCATGGACCTAATACTTCTAAGCAGTTCTTTTTTGCTGTTCTCCATGAGTCCATAACTCAAGTATATGTATGTCAATGCTGCTTAAGCCAAGCCCAACTTGTATCTAGCAACCATCATTTTCATTAGACC #AAAFJ<AFJFJJ7FJFJFFJF<FFJJFJJFJFJFFJAJ<FF-FJFJ-FFJFJJJFJJJJ-<<<<AA<JFAAJ--7-<FAJJFJJ-FJJFFJJJFFJFJFJ<AJJFJJ-7<AAF7FF7F<AJ-F-A7A<7FF)7<F<AFJJJAFFJ<-77 AS:i:-3 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:0A131T17 YT:Z:UU RG:Z:BMG GWNJ-0957:266:GW1808031300:6:1101:1205:32408/2 147 Scaffold_173;HRSCAF=178 12924933 42 150M = 12924813 0 GTATCTAGCAACTATCATTTTCATTAGACCTTTTTTTGGAAAAACTAGACTCATTTCATTAATACACAAAATGTGTTTTATAATAAAGTGAAACACAAACAAAGGTTGGATACAAAATAATAGTATAGAATTTTTTGTTAATNATTGATC 77<<F7<<<F<JFA7---AAJJJFF--FAFJFJAJFA77A-FFFJFJFAFAFJJJJJJFAJJFJJJJJJFJJJFFFAF<F<JJJJJFJF-JJJJJJJJJJJJJJF7<-FJJJJJJJFJJJF<<-<-JAJJJJJF-FFFFJFJ#AAFAAAA AS:i:-1 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:142C7 YT:Z:UU RG:Z:BMG

--

plnspineda commented 2 years ago

Hello, I had the same problem too and sorting the bam file with samtools worked for me. My bam files is an output from the Arima Pipeline.

samtools sort -n file.bam -o file_sorted.bam

It took ~5 hours for a 29Gb file with 48 core and 128 memory.

Tetyana-Tsykun commented 2 years ago

Hello,

I ran into the same error message. In my case (attached top few lines of bam below), the problem seems to be matlock doesn't like the trailing /1 and /2 in the bam file. Removing the "/1" "/2" in the read names solved the problem for me. Hope this could be useful.

Regards, ChuShin

-- GWNJ-0957:266:GW1808031300:6:1101:1205:32056/1 99 Scaffold_26344;HRSCAF=26510 10281631 42 121M = 10408124 0 NTAATAGACATATTGAAAGCATGGTTGGTTGGTCGGTATTGACCAGTTGCATTTGAGAGTGAAAAGTTTGTAACCAAACGTCATATTCCCCAAGGCAAGTCACGTTTAACACGAAACATGT #AAAFJJJJFJJJJJAJAFFJFFJFJJJJJJJJFJJJFFJJJF-JJJJJ<AJJJJJJJJAFJ<7FFJ<JJ7JJ<FAJ<JJJA<AAF<JFFJJJFJJJ-FJ<<JFJJJJJJJJJJJFJ<AFF AS:i:-1 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:0A120 YT:Z:UU RG:Z:BML GWNJ-0957:266:GW1808031300:6:1101:1205:32056/2 147 Scaffold_26344;HRSCAF=26510 10408124 7 150M = 10281631 0 GATTGTTCCTTCTCTACTACAACAATAGTTTTTTCTATGCCTCCATTATCGCTCTCACTCTGCATTTATTTTCAATAAAAAAATACATTAATAACAGAGTGTGCAAAGTAAAACTAACAAAATTCATATATTACAAAAACCANAAGTTCA JJJAA-77-JFAAF77-<7FJJJJFF<FJJFFA<AF-AJFAFJJJJFFFJJFFFFFJAAAJJJJFJJJJJFJJJJJJJJJJJJJJJJJJJJAJJJJJJJJJJJJJ<FFJJJJJJJJJJJJJJJFJJJJFJAAJJJFJJJJFJ#FJFFAAA AS:i:-4 XS:i:-9 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:8T133A7 YT:Z:UU RG:Z:BMG GWNJ-0957:266:GW1808031300:6:1101:1205:32408/1 99 Scaffold_173;HRSCAF=178 12924813 42 150M = 12924933 0 NGCCTGAAGGATAAAAGTCATCATGGACCTAATACTTCTAAGCAGTTCTTTTTTGCTGTTCTCCATGAGTCCATAACTCAAGTATATGTATGTCAATGCTGCTTAAGCCAAGCCCAACTTGTATCTAGCAACCATCATTTTCATTAGACC #AAAFJ<AFJFJJ7FJFJFFJF<FFJJFJJFJFJFFJAJ<FF-FJFJ-FFJFJJJFJJJJ-<<<<AA<JFAAJ--7-<FAJJFJJ-FJJFFJJJFFJFJFJ<AJJFJJ-7<AAF7FF7F<AJ-F-A7A<7FF)7<F<AFJJJAFFJ<-77 AS:i:-3 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:0A131T17 YT:Z:UU RG:Z:BMG GWNJ-0957:266:GW1808031300:6:1101:1205:32408/2 147 Scaffold_173;HRSCAF=178 12924933 42 150M = 12924813 0 GTATCTAGCAACTATCATTTTCATTAGACCTTTTTTTGGAAAAACTAGACTCATTTCATTAATACACAAAATGTGTTTTATAATAAAGTGAAACACAAACAAAGGTTGGATACAAAATAATAGTATAGAATTTTTTGTTAATNATTGATC 77<<F7<<<F<JFA7---AAJJJFF--FAFJFJAJFA77A-FFFJFJFAFAFJJJJJJFAJJFJJJJJJFJJJFFFAF<F<JJJJJFJF-JJJJJJJJJJJJJJF7<-FJJJJJJJFJJJF<<-<-JAJJJJJF-FFFFJFJ#AAFAAAA AS:i:-1 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:142C7 YT:Z:UU RG:Z:BMG

--

Hello, I have the same problem. How did you remove "/1" "/2" in the read names in bam file?

ChuShin commented 2 years ago

Hi @Tetyana-Tsykun ,

I used sed attached below:

samtools view -h INPUT.bam |sed '/^[^@]/s/^\(.*\)\/[12]\t/\1\t/'|samtools view -Sb -o OUTPUT.bam

Regards, ChuShin

Tetyana-Tsykun commented 2 years ago

Hi @Tetyana-Tsykun ,

I used sed attached below:

samtools view -h INPUT.bam |sed '/^[^@]/s/^\(.*\)\/[12]\t/\1\t/'|samtools view -Sb -o OUTPUT.bam

Regards, ChuShin

Hi, @ChuShin

Thank you for your answer! I eventually did similar:

samtools view -@ 18 -h INPUT.bam -o INPUT.sam sed 's/\/[1-2]//g' INPUT.sam > corrected.sam samtools view -bt -o OUTPUT.bam corrected.sam samtools sort -@ 18 -n OUTPUT.bam -o OUTPUT.sorted.bam

and then matlock script finally worked without error message

Cheers, Tetyana

milkcookie commented 1 year ago

Hi all @all: I meat the same problem. I have sort the bam by reads name and my read didn't have sufix ”/1 /2“, if your bam is same with mine, you could check the reads pairs name. matlock check the paired reads name, for example, after sort bam by reads name, the reads name will be same every 2 rows, but sometimes, the unmapping reads or the only one strand read will influence the bam result, so you can use samtools view -F to filter the bad mapping, makesure reads were paired in bam file. the origin code is in matlock git fold, src/matlock.c line 33, the function int process_pair_juicer(), when the wrong paired reads, it will return 0, the error happen line248( if(process_pair_juicer(header, fh) < 1) ).

image 微信图片_20231016143505