Open Wshengquan opened 3 months ago
Could you please attach one read from your bam file? I am suspecting there are some bugs with the methylation signal reading. Thanks!
Get Outlook for iOShttps://aka.ms/o0ukef
From: Wshengquan @.> Sent: Monday, July 15, 2024 8:44:03 AM To: treangenlab/methphaser @.> Cc: Subscribed @.***> Subject: [treangenlab/methphaser] KeyError: ('C', 1, 'm') (Issue #22)
hello, I got an error when I used mathphaser: /share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/meth_phaser_parallel:235: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass (name,) instead of name to silence this warning. phased_df_chr.get_group(chromosome).iterrows() /share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/methphasing:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html from pkg_resources import require Traceback (most recent call last): File "/share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/methphasing", line 1471, in main(sys.argv[1:]) File "/share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/methphasing", line 1437, in main ) = get_assignment_max( File "/share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/methphasing", line 898, in get_assignment_max base_modification_list = get_base_modification_dictionary( # build the dictionary with snp phased reads File "/share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/methphasing", line 243, in get_base_modification_dictionary for i in mm[methylation_identifier]: # Remora only output one type of score: c 1 m/c 0 m, but this part can be improved for other methlyation callers KeyError: ('C', 1, 'm') My run command is:meth_phaser_parallel -b sample.whatshap.haplotagged.bam -r ref.fa -g sample.phased.gtf -vc sample.haplotype.phased.VCF -o path/to/output why this happened? Any help to overcome this is appreciated!
Best
— Reply to this email directly, view it on GitHubhttps://github.com/treangenlab/methphaser/issues/22, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADDV4HSJUIW34QP42SJNL7DZMPG2HAVCNFSM6AAAAABK4SWL4WVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQYDQNZYGU2TQOA. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thanks for your reply. Here's one of my reads: 8e9d05a0-9447-4b61-9600-1d9e92384702 0 1 1 60 1513S328M2I1M1I255M2D501M4D2M1I41M1I232M2D841M3S * 0 0 GGGAACTCCTCCTTTTTTTTTCCTAAGAAATTATCTAAATAATAAATTTTGTTGAAGTGTTTGTC TAGGATTTCAAGGTAACGGCCTAGAAGTCACCATCTAACCACTGTGGCAAATCCGTATGAGTGACATTAGAAGAAGTAATCTTTTGTCTATCATTTTTAATTTTATTAAGTTGAATACATAATTTTACAATTTTATAGTCGCTAATGACTAATTGGTTAATTATCCAGCCAATGAAACTAGTAGAGGGAGTTTAGGAAATGCAGAATAATTAGGCATTTCCTAGGACTTGTGTAAATTAGAATATGACGGTAGCATTTTTATGGTTGTAAAATCAGAGGGAAGGCATATATATAGCTGTTGTAGTTTTTGTTTTTGTTTTTGTTTTTTTTTTAATGGCTACACCCACGGCATGTGGAAGTGTCCCTGGTTCCTGGTTTCTGGGCCAGGGATTGAATCCAAGCCACAGCTGCAGCAATACCAGATCCTTTAACCCACCGCACTGGGCTGGTGATTGAACCCTGACCTACACAGCAACCCAAGCTGCTGCAGTTGAAGTCTTAATCCGCTGTGTCGCAGCAGGAATTCCCATATAGCAGTTTTTAAAACAGTTAAAACTATCTATTTTGTCAAACAGTCATTTTGATGAGACATATTTTATGAATTTTTTCGTATGTAACTAAATATCTGATATTAATACTTTAGAGTTGTTTGAAAGTAATTTTTTGCTTTATATCATGTAACCTAGTAACTGAAGCCATTGCATATATATAAATGGCAGTACATTAATATTGTTTCTGAGAGCCCATTGGAAACAGGAGGTCACTTCACTATCTGTTGGTATGCTTGTAATCTGATTTATTAATTCCTTCTGGAGGTTGAAAAGGGTTATCAGTTGATTTGCCTAAAAAAATCATTAATAAAATTTACAGTTAAAGAAAATTTTTGTAACTTCTGTCTTCACTTGTTGGAATGTGTGTGAAAGAACACACAGATCTTGGTCCAGCTGCCCTAGTAGCCACAGTTTACTTGTCGGCTCCCTGTCCAAGAGTACACATGCCACATGGCCATGGAAGGGAATCCTGTATAGAAACAATGACAGGTATCTGGGTGGCTGTCATTGTCTAATTCTCCTGACAGAATCAGCAGTCTGAGTGACTCATTAACCGTTGTCTCTGACTGTTTCAGCGATTTGCTGCTGTAATCATGAGAATAAGAGAAACCCGCAGACACCGCACTAAATATCAGCTCTGGGAAAATGGTGTGCACGGGAGCCAAGAGGTGGGTCTAAAAGGGTTCATTGTCCTAGTCTGTGCGTTAGGAAAGAAAAGGCTGTATGTCGTGGTTTCTAGGTATTAGGTTACTACTGTGATACATTGTCATTGTTGCTTTCTGTTAAGGTGGTTGATTTTCATCACTGCAGACAGCAGTAACTTCATTTTCTTAAAATCAGGCATGAGTAAGGATGTGTGTTATCATCTGATTTCCATATAGTTGAGCGTGATTATGTGCTTAATTTTTGTCATTTCTCACCCCTGCTCTTGAGAGCTTTTGTTGATAATGTTGTTATTGCTTTCATTCTGCTTTTATTTTGTAAGCCCTGCACTCATTCATCGCTGTACCCGAATATGAGGTAAGGAGTGGTAAAGAAAGACTGGACATAAAAGAGGAATTAGCATTTGCACTCTTCAGATATAAATGCCATCAGTATTTTCCTATTAAAATGAAGCTTGTTTTCATCTCAGTGGAAATCTGTGGCTAAAGTACAACAATAGTAATGATAATGGTGAGGCTGTTGTACTTCACATCTATAAAATCTTGCATCAATAATTTGATTAACCAGATTCCTTTGGGTAGGCCTACGTTTTCTGTCAGAGACACAGGAATACTTTATAAATAAAATTGTTAATGTCTGTTGATCTTTTTTCATTGGAAGAGGGTGACCAGTTTACCTTTTGAAAAAAAACTTTCCTAATTTGGGCTTTTTTTTTTTTCTCCTTTTTAGGGCTGTACCCATGGCATATGAAAGTTCCTGTGCTAAGGGTTGATCAGAGCTGCAGCTGCCAGGTTACGCTACAGCAACACCAGATCAGGTTGTCTGTGGCCTTTGCCATAGCTTGGGGCAGCACCGGATCCTTAACCCACTGAGTGAGGCCAGGGATTGAACCTGCATCCTCCTGGATACTAGTTGGGTTCTTAACCTGCTGAGCCACAATGGGAACTCCTGGGCTTTTTATAAGTTATACGTTAAATAATTATTTTAGCTGTCTTTGAGTATGAATATCTCACTTTTTCTTTCCTTAGTGAAGAACAGTCCAGACTAGCAGCAAGAAAATATGCCAGAGTTGTACAGAAGTTGGGTTTTCCAGCTAAATTCTTGGACTTCAAGATTCAGAACATGGTGGGGAGCTGTGATGTGAAGTTCCCTATAAGGTTAGAAGGCCTTGTGCTTACCCACCAACAGTTCAGTAGGTAAGTCTGAAATGGATTGTGATTGCTTTTGGCAACAATTAATTTATAACCTATTTAAACACTGTTCATGATTTTTAAAAAACATGCAAAGTAATTGGTATATGAAATCAAATTATTTTGGTTTTTTCATCTTCAGGACCATAGCAGTGGCATATGGAAGTTTCCGGGCCAGGGGTCAAATCAGAGCTGCAACTGCCAACCTCCACCACGGCCACAGCAGTGCCAGGTCCCAGCTATGTCTGTGATTTACATCGCAGCTCAGGGCGAAACCAGATCCTTAACCCACTGAGCAGGGCCAGGGATCGAACCTGAATCCTCACTGATACAGTTTTGTTACCACTGAGCCACCATAGGAGCTCCCAAATGATTCACATATAGATGTTTTACTATTGAAATTTCTCCCATTCCTACCATTCTCACTGGTCTTCTACTTCTTAATGATACCCCAGACTCCCCTTTACTAGGATGAAATTGGTTCCCCTTCTGTATGTTTCATCGTTTCATTGCTCAATGAATACATTTCAAATGCATGAATTAGAAGTTCCTGTTATGGTGGATATGACATTTCTTTCTCTTTGTTTTCCCTGCTGCTCTCCTGGTGAAGACTTTAAAGGCAGGCTTGCCCATCTCTGCAGACCCTGCTTGCAGCATGCGCCTGGCGTGCTGTGCCCTTAATTGCACCAGGGCCTTTGCACATCCCGCTTTTATGCCTGGTGCTCTTGTTTCACTCTTCACCTAGGAAACTCCCACCGGCATTTCATGTATCCATCCAGGCATCTTTTCTTCAGAGAAGCCTGTCTTGAATTTTTCACTGGGCTAAACACACACACACCTAACTATTTTCAGTCTCTCTAACTAACAAAATGAGATGACTAGGATTAGAAAAAAACATACCAGTAGTGCATTTGGTCTGACAACACAGCGAAAGGTTCATTGTCTAACCAGTATCTTTTCTATCACTTTTGGTTAAGTCGCCTGACCTAGGTTAACACTTTGAGGATATTTCAGTTTAAGGAGATGAGATGTGAATGATTAGAAGGTAAACTGTGTGACTGTCATATCTTAGACATAAGTAATTCATTAGGCTCTTGTAAATCAGTGTACTTCACTTGTCCAGAGTGAAACTTGATGAGGGGCAGAGACTACAGAACATATTTATAAAGCTCTTGTTCTCATGTCTGGAAATTCAGATTCATTAGAAGTAAGAGTTCGTTGGCCTTCAGGTGCCAATTACAAGTCAGCTTGAG ISQKMJSFEB===:;AABFJECHEFHMHJNGQSQMJMMSHISJKMSCKSSSS9557JFJJEJFSFJLLGSSIHLIJHJSKIMSSSLKKSJPIILOSKLKIHNIHIKSQJSHSNOLIGNSLLJSKPJIIIO>LHA@BDNJLKIJKSMJSLLSSMMISMGJIILSOOKHSSLSRNSJHINMSSJSMIPLJSSJJSGJSKSIPIMSISISMKKIPOHKKHSIPJOHNNKKHKIMJLHMOJPJSKMIKQLHIHHSQIJHSNLKKHJNOSJJSMJSSJKKSLEOJIJHISSHPIJLMLSIIMSPQSJSIJSLIJSHJRKKJLIGE@88=;0///GNISJKSLFGISSISJJSNSKKHJKOJIF::9989ABCEKHEISLPSSSIHGKFJRGI@>;AEISSSSHGEESSSKOKIMSISKHJHHSSSKPSLGLISSJEHSQPLGLNKOSJIGECED@A;;:99:44459CEDBFJJHKJSGHFSLSJHSSSOSMJSMHSQSLGGMJJISNIIGLKPJSSSILKICEKIHJJSJHSKNSKKII=<;8((((((///0@IFICB;788956,&&&&&+*,//3333336JSJKNKGGSDDDSINSMKHSKSSSLKJSJQKHG?<=J;;/////;>.BHEHJLQJJNLGSMNIGKNSSJSISJSSISSQISHPMPSSINJSLSRSLJHISHSSISSKNSSISOSIHSISSISSQGJSSPRSHHNLJKKKSKPSSSIKCSFDEJGHOJIFMSJLJHSLISKSJSSISJSHKILHOLSPKHFFSLHSNSMKSKJJOSLSPS(((((GKKJOSLQHSSHJSISKSOMOKEH;;::;FSINPMLILSMOSSKRJRMSSMKIPSSLSISSSSGHGKJHMOJIMSHNHIHSONSSSMSSOKHSJFLNLSSOMSKHKKSSEEJKSSSSSSSSGJSKJSOSSMKSJMIFHSGSKSLSJNSKJSSHHMLIHRMMHSKHRGIJHSLLSSQHSLIISJB99:?GLEJESIPSMSSH>:;:;JRFOSLIOKFSKKKSSHOSJSIJS>==<:<9;;ACGMF@AEPKFHSKNSNERSSOJGKKSMSJHRKLHHNOHKKSSSPSJKKMIRJHKMLHSNSSSSSISSQS===<JSSSNRSGMJKHMGJEBDCG>===MSQMNJNSPNSHGSGHGLLFGMLSFIILJNKSNRHMKKKKIIKKJKKJSLSJIIGLLIHD410,++)'&&'''%%%&&&&(((&&'(&'($$$%'(()3457@>.,,,,1&&&((-+++@BIRKSISIPSSMKHJISSJLE?@BBAGSKSJSPSKIPHSKSSSSNSSHHMSJGDCCJMSSSLNSSMLHKL SMSJJMKSSKOMNSSHSSMMLSMJKSSISGSJOJOFIN>>==>SJSKSSKISSISSKNISSLHFSSSSRHGISIIGHGN@@CCBGOKGFHIKHKJISHRHJSSKSHSSKISIHRHKSKSGEJNMSJHSJNNSJNSGJHSJJSSHHSMOSSSHSSJSJHSPSLSGSKSNKNNGGHLGHSNPSJNPKJOSSGJNSSSKKHSLSSSLJSSILKSIGNSSSLISKSRHSILKDHIIPSSHSOSRSSMLMMOKLIHGNKQMJSOFEFLBFEPOFIKHLSHSKNGSSSECLHJJHLSSMSIKSSSSSSSSHHINMSJGEKIKSIJB><<<==:4433/00HSIIGEJHKJGSSSSJSSSISGLJSGIRLNILSMHHJGSLHIJSNINGSSMJJHHHJDDFHLNQIKLSKSMKKSSMOKMLJIMJIJSSHSMSOLSKSNSKSSSGOHIPSSHGISLSJKJSOLOJOGKNSMPKSSFSEQKJKISPJLOSSMHLJJSGIJC>@,,--.HGEGHGB80--,++(&'()((()BFADGR<;;;<CECDMMSIJSSKSSKKSSHILSSQLSNA@AA@HHSNPIKLJSSSSHSKSLSFNKEGFMF:<20GJSFEHGKSKGNLSLSSLSLIJHQSISHMB76S:9989JSSSKFIJESKMSLHISHKSGS8GEIGSFSSGIEAA55=?@BJIKSQLRLGOHKHJIIHONHSSSNSLLJGSLKLGSSGME<;100004100000EIELIJGCNIHMMHHMSSJHI?:98,,)((%%%$$%''&),-2367622658EIFCABB?8889ESNSKSLGFIJFDJFEGGJSLSSIQSHSFGHSIQSSJFDDCCESPSKMPSJILJKHLNSHJHMHKJIFHJ FCDCDOPSIHOJJSMLJSSHKIIKIPGMIKSSM?>>?H;;<<<EFFGSQBCLIKFHLGSISISPJSRLSKSSQIHSSNJDMNLKSGHKFGISKSPSKILSKNSHSDHHSHA@@JSISSSGJOJRSQJHHKHSSISGLSHKSGISSJMSJGKNOSIISSNKSJSLKLIH@NRIENINESIJLJKHSNISSHSISEBFEFNSHMHSLOKHLJSOSIIIS JSPIJIMSLSKKSILSPSQHKMSHMSKGKLKLIKKSSHHHIKGILIEQMLOSHSNGHFFHHPSDRKEMHLKSSSSSHMRSJGSSSJHQMSSMMSSJJHOKSLKJPSIQKFIJMJLSMKNSLLSIOSKSJSSSJJCF>;;@AASSMPSSJSSNNNSSSKHSHKSJSJRHPHLSKKHKHFCODJHHEGECB)&&&&((112GOKIKHFGSSLHHKSSNISLI@MJSSFHFLSSLNLMJJSSJNS??>??SSKSQHIKLPSIGJSJBA642*222>??AFKFHSIILGQSHILSMLIJKJILJJOSHSSKIHHRGJSPSSFSGIJSL;:981023356AADCCCEHSHHOSJHQISLPKMFHH:4554448=@GNSNJMILHSKJKKMSMSNKIJISKJSKS<<1111364311BEEESNSSJSMLLSNHILSLJSJNJKJMJJHKOSLHIIFFEHEFLLSNG;40()3767@ADCSSHSGSSPSHMSQJOIAFC?>>>>A33333;@>2?:9:69<?<GH56DBBDEGSLGPJSMSFSHRSJGSIGMSPSSIOJIIGSKSNKSKSHPSKLSSSSJSHIJSS?D-----SSNKLHMSFSILSHOSLSKIIJIIKSSSJISIHKLLSJISKPQISKKSLGAISKGFEB7D@? ??KIGHFAAJLSJJGSMFSPFMISKSJSMLIJKFKJSSSILMMHIGGSPSKKHSSRSJSLNKPOSLNSJONQIISRLISKNJHQIJGQSLIISNSHPSHSMSJKSGSOMRSOISSJSLSJGLKHSLSHIKSMSJSLSGKHJRKKGJKFKHIILJIB@90/42//1932222344444:;<CJFKSILLMFJSHIEE==CCSSJLHSGHIJSJLSJSSISSSSNJNGSKGIFSJSSIKPLKJSSPSJSHRSJISSJSSJSHSKIHMSSSQJHJSSGGSSHSKSRIIJKSJKME))*477>BBCJGSSPHKLSLKJPIKSSSSOHMSHKSGSISOSSSSPSGILFHIGEJ=<;;;KSPHSGNLSKGFGFIHSSJJSJMHRSSLMPKKKOSJHNHOMLSQNLLGKQSSSJFSHSISFNSQSIHJHSSQILNSSIHJISSIHFJHSISGNFGSLGKNOIKHQJSHSQSHRIISJNSGHSIGNJSSKLSSOJGSSLJOSLSISPHJHHJMIIKJHSMSLLNSJKSPHJSSSRISLKJSSHLMSJKSJILHJKSIJSFGDDCDBBCSIKSFFGDACHSSMJSSSKLSRKFKIJHGFDPQESSJHISSSLKJJIPKJJGSSJSGKHGF>=DJISSSSIJSKSKJQISMSH=;:98888 755.)&% qs:i:21 du:f:9.5696 ns:i:47848 ts:i:592 mx:i:3 ch:i:843 st:Z:2024-04-08T11:56:24.693+00:00 rn:i:5741 fn:Z:PAW37422_pass_b74e10a2_68ebe454_253.pod5 sm:f:100.808 sd:f :28.356 sv:Z:quantile dx:i:0 RG:Z:68ebe454a63ea33cf655142d882767dfb3012d4a_dna_r10.4.1_e8.2_400bps_sup@v4.2.0 MN:i:3722 MM:Z:A+a.,0,17,2,6,3,0,0,3,12,0,12,2,13,1,9,5,0,3,12,8,0,0,0,0,0,0,0,0,2,7,0,0,0,2,0,0,3,0,0,0,2,9,63,0,0,0,3,0,0,47,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,13,0,0,0,1,0,0,6,2,0,0,0,0,0,0,0,0,0,0,0,0,0,8,0,0,4,0,3,1,1,0,17,0,0,0,50,0,1,23,39,0,0,43,1,3,0,0,0,0,4,0,0,0,0,0,0,0,0,4,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,1,12,1,14,0,0,0,3,4,0,0,0,0,0,0,0,3,2,0,3,6,0,39,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,2,1,0,0,1,0,0,2,2,8,0,17,4,0,0,0,12,14,0,1,0,0,0,4,0,0,0,1,0,0,0,1,2,5,0,0,1,33,12,0,0,12,0,1,41,1,0,6,1,1,4,8,0,0,0,0,0,0,0,0,0,0,0;C+h.,9,0,2,3,0,2,0,6,3,2,2,0,1,1,7,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,5,3,1,0,0,0,0,0,3,0,0,0,0,0,1,1,0,0,1,11,14,0,0,1,0,0,0,13,0,0,0,0,0,0,0,0,0,1,0,0,2,0,0,0,2,0,0,0,0,0,0,0,0,4,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,2,3,0,0,0,0,0,0,0,0,1,0,2,2,0,0,0,5,6,4,0,0,0,0,0,0,8,1,0,1,0,0,16,4,6,0,1,0,1,0,0,1,2,0,0,0,9,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,8,0,0,5,2,3,0,0,4,1,0,0,0,0,0,1,12,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,0,0,0,0,22,0,8,3,0,0,3,1,0,4,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,5,1,0,0,1,2,2,8,16,0,0,0,0,0,0,7,1,1,4,9,2,0,0,0,8,1,0,0,0,0,0;C+m.,9,0,2,3,0,2,0,6,3,2,2,0,1,1,7,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,5,3,1,0,0,0,0,0,3,0,0,0,0,0,1,1,0,0,1,11,14,0,0,1,0,0,0,13,0,0,0,0,0,0,0,0,0,1,0,0,2,0,0,0,2,0,0,0,0,0,0,0,0,4,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,2,3,0,0,0,0,0,0,0,0,1,0,2,2,0,0,0,5,6,4,0,0,0,0,0,0,8,1,0,1,0,0,16,4,6,0,1,0,1,0,0,1,2,0,0,0,9,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,8,0,0,5,2,3,0,0,4,1,0,0,0,0,0,1,12,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,0,0,0,0,22,0,8,3,0,0,3,1,0,4,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,5,1,0,0,1,2,2,8,16,0,0,0,0,0,0,7,1,1,4,9,2,0,0,0,8,1,0,0,0,0,0; ML:B:C,14,13,53,12,35,21,16,14,112,191,38,66,13,12,12,39,111,24,13,22,80,72,66,148,40,35,28,14,28,35,40,29,16,13,26,251,29,241,106,70,12,171,107,16,21,12,12,16,18,12,12,17,15,12,17,22,13,14,22,25,46,18,71,17,24,17,23,17,48,20,25,15,20,15,78,20,139,100,80,74,146,146,112,125,167,117,108,87,25,51,12,38,46,12,15,13,12,111,18,21,20,19,14,12,31,26,30,41,59,39,17,12,13,56,77,45,31,14,53,29,222,132,113,74,21,97,14,14,18,17,22,25,34,16,68,14,210,38,14,25,16,159,12,13,102,13,15,17,18,34,126,38,22,33,25,19,39,93,47,76,15,21,15,39,14,15,14,32,21,35,43,62,49,51,64,36,36,15,21,62,45,255,255,29,42,38,40,17,227,32,37,29,15,226,91,21,156,42,85,112,128,14,132,14,88,206,12,14,13,16,15,15,40,38,53,51,53,63,30,26,87,12,159,14,14,13,90,21,12,14,24,63,22,13,14,12,20,112,133,55,43,108,31,138,13,16,17,59,29,182,18,15,8,0,3,8,3,1,1,3,4,2,5,5,16,8,2,9,28,40,33,25,33,18,30,125,7,51,6,2,2,2,2,1,4,5,4,4,3,2,4,8,15,100,9,8,12,12,147,19,13,30,6,7,5,2,9,16,15,12,9,6,5,4,2,3,3,11,5,6,6,3,14,255,30,10,7,22,5,8,11,6,6,4,6,8,10,6,7,6,5,5,5,3,4,0,0,7,11,9,7,0,5,2,29,27,71,47,3,7,2,2,1,37,78,28,7,8,17,18,1,11,9,4,11,7,9,5,2,1,2,3,3,2,2,2,6,3,44,9,23,39,1,3,9,7,18,6,7,8,7,4,21,8,13,7,6,6,4,47,6,7,5,9,15,19,34,22,11,11,0,5,4,6,13,130,57,230,22,11,11,21,9,12,21,27,86,5,5,7,8,10,10,14,16,5,7,5,4,7,7,9,7,8,3,3,4,11,5,7,1,24,5,4,8,7,7,9,49,7,6,6,4,3,4,2,8,2,5,5,8,11,12,3,10,6,17,11,2,12,3,2,2,5,5,2,6,4,20,24,9,16,13,10,11,13,12,14,4,12,4,2,2,25,28,3,4,6,3,4,2,6,11,10,13,11,10,10,3,3,7,7,3,4,2,0,1,3,12,1,3,2,15,6,4,10,13,8,14,11,9,4,4,9,15,14,8,10,7,3,4,4,5,20,15,16,4,18,1,8,8,11,12,5,4,6,3,3,20,8,7,8,13,4,3,11,7,3,5,13,66,13,2,3,3,7,7,9,12,4,3,1,3,2,106,92,5,5,17,10,2,39,21,8,10,9,5,63,7,16,255,13,24,12,37,254,252,22,12,13,13,240,13,254,16,36,40,40,39,52,50,24,24,20,79,19,13,14,12,12,254,16,12,13,14,15,13,14,22,9,7,18,17,14,240,36,235,17,225,12,16,18,12,22,38,16,22,17,13,16,16,13,17,16,22,22,18,28,12,14,0,6,12,23,46,18,22,20,18,19,12,21,16,18,15,20,31,13,18,15,13,18,13,255,16,17,18,200,255,19,18,43,65,137,90,25,37,253,12,254,55,38,37,18,12,15,26,254,244,12,12,21,17,20,14,42,254,12,14,14,14,12,12,14,12,27,246,13,11,254,12,15,19,100,18,16,247,22,13,26,12,13,14,13,12,13,14,16,17,13,14,27,36,42,44,29,33,255,14,25,232,85,94,110,8,70,26,29,59,31,4,234,51,44,18,18,21,28,27,28,26,27,17,20,15,14,17,19,17,14,20,13,15,15,24,14,16,254,12,15,12,16,18,16,19,50,13,22,21,15,13,14,12,12,45,18,16,20,27,13,252,20,19,41,29,14,12,14,13,16,21,21,12,24,19,224,55,35,26,30,34,26,31,24,28,13,212,251,22,31,229,76,246,28,23,19,20,12,24,36,23,23,244,23,25,17,17,21,20,18,16,13,17,14,16,47,254,14,16,11,12,12,21,24,16,24,26,23,16,15,58,26,23,20,28,26,16,14,13,17,31,29,29,251,79,254,21,16,20,28,20,19,22,23,28,11,12,248,15,25,12,12,12,21,14,20,14,189,10,14,13,24,28,14,16,21,15,13,254,22,253,15,2,13,16,30,19,12,29,234,16,24,29,22,22,30 NM:i:25 ms:i:4282 AS:i:4276 nn:i:0 de:f:0.00860507 tp:A:P cm:i:350 s1:i:1968 s2:i:0 MD:Z:169G154G1G154T72C23C0A4^TA501^TTGG3T77A55A38A98^TG119T721 rl:i:400 SA:Z:1,5438,-,2207S1514M1I,60,15; Is there an error because the label of methylation recorded in my bam file is MM instead of mm
I see. The read you are showing here only has 6mA calls, but MethPhaser can only phase with 5mC calls.
MM:Z:A+a? is for 6mA calls. C+m? is for 5mC.
MM:Z:A+a.,0,17,2,6,3,0,0,3,12,0,12,2,13,1,9,5,0,3,12,8,0,0,0,0,0,0,0,0,2,7,0,0,0,2,0,0,3,0,0,0,2,9,63,0,0,0,3,0,0,47,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,13,0,0,0,1,0,0,6,2,0,0,0,0,0,0,0,0,0,0,0,0,0,8,0,0,4,0,3,1,1,0,17,0,0,0,50,0,1,23,39,0,0,43,1,3,0,0,0,0,4,0,0,0,0,0,0,0,0,4,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,1,12,1,14,0,0,0,3,4,0,0,0,0,0,0,0,3,2,0,3,6,0,39,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,2,1,0,0,1,0,0,2,2,8,0,17,4,0,0,0,12,14,0,1,0,0,0,4,0,0,0,1,0,0,0,1,2,5,0,0,1,33,12,0,0,12,0,1,41,1,0,6,1,1,4,8,0,0,0,0,0,0,0,0,0,0,0;C+h.,9,0,2,3,0,2,0,6,3,2,2,0,1,1,7,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,5,3,1,0,0,0,0,0,3,0,0,0,0,0,1,1,0,0,1,11,14,0,0,1,0,0,0,13,0,0,0,0,0,0,0,0,0,1,0,0,2,0,0,0,2,0,0,0,0,0,0,0,0,4,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,2,3,0,0,0,0,0,0,0,0,1,0,2,2,0,0,0,5,6,4,0,0,0,0,0,0,8,1,0,1,0,0,16,4,6,0,1,0,1,0,0,1,2,0,0,0,9,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,8,0,0,5,2,3,0,0,4,1,0,0,0,0,0,1,12,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,0,0,0,0,22,0,8,3,0,0,3,1,0,4,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,5,1,0,0,1,2,2,8,16,0,0,0,0,0,0,7,1,1,4,9,2,0,0,0,8,1,0,0,0,0,0;C+m.,9,0,2,3,0,2,0,6,3,2,2,0,1,1,7,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,5,3,1,0,0,0,0,0,3,0,0,0,0,0,1,1,0,0,1,11,14,0,0,1,0,0,0,13,0,0,0,0,0,0,0,0,0,1,0,0,2,0,0,0,2,0,0,0,0,0,0,0,0,4,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,2,3,0,0,0,0,0,0,0,0,1,0,2,2,0,0,0,5,6,4,0,0,0,0,0,0,8,1,0,1,0,0,16,4,6,0,1,0,1,0,0,1,2,0,0,0,9,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,8,0,0,5,2,3,0,0,4,1,0,0,0,0,0,1,12,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,0,0,0,0,22,0,8,3,0,0,3,1,0,4,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,5,1,0,0,1,2,2,8,16,0,0,0,0,0,0,7,1,1,4,9,2,0,0,0,8,1,0,0,0,0,0; MM has not only A+a but also C+m
I see, yes indeed this is a bug of MethPhaser. I would suspect that there are some reads that only have 6mA calls but do not have 5mC calls. I will try to apply a filter to skip those reads.
I've made the changes, could you please check the MethPhaser version of (https://github.com/treangenlab/methphaser/tree/Fu-Yilei-patch-1)? Or if you could provide an example BAM file I can do the test.
subset.zip This is a subset of my data. Do I need to reinstall mathphaser
yeah you need to clone the version in the branch I provided
Thank you very much for your help, waiting for your test success using the subset file
I used the new version you changed yesterday, but the error is still reported, but it seems to be different:
/share/home/yzwl_hanxs/app/dorado-0.7.2-linux-x64/bin/meth_phaser_parallel:235: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas
phased_df_chr.get_group(chromosome).iterrows()
/share/home/yzwl_hanxs/app/dorado-0.7.2-linux-x64/bin/methphasing:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
from pkg_resources import require
Traceback (most recent call last):
File "/share/home/yzwl_hanxs/app/dorado-0.7.2-linux-x64/bin/methphasing", line 1477, in
sorry forgot to change one spot in the code, now should be fine :)
I tried again, but I got a mistake:
/share/home/yzwl_hanxs/app/dorado-0.7.2-linux-x64/bin/meth_phaser_parallel:235: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass (name,)
instead of name
to silence this warning.
phased_df_chr.get_group(chromosome).iterrows()
/share/home/yzwl_hanxs/app/dorado-0.7.2-linux-x64/bin/methphasing:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
from pkg_resources import require
Traceback (most recent call last):
File "/share/home/yzwl_hanxs/app/dorado-0.7.2-linux-x64/bin/methphasing", line 1471, in
Sorry about the back and forth. Was this the program from this patch? https://github.com/treangenlab/methphaser/tree/Fu-Yilei-patch-1 Based on the line number of the bug, it seems like you are using the main branch. Or you can send me the input you are using for this program, it is hard to debug with only subsampled reads.
I did reinstall it from here : https://github.com/treangenlab/methphaser/tree/Fu-Yilei-patch-1 My input file takes up too much memory and cannot be uploaded to you via github, even after compression. How should I give you my input file
you could limit your vcf, gtf and bam to the first 3 phaseblock on chr1. I think those could be sufficient. Thanks.
“you could limit your vcf, gtf and bam to the first 3 phaseblock on chr1. I think those could be sufficient.” I'm sorry, how do I do this? This is my first contact with haplotype related knowledge, I really do not know how to do
Hey no worries! For gtf file, keep only first 3 lines. For bam file, use samtools with samtools view function with specifying region chr1:0-x (x = the phaseblock end of the 3rd phaseblock) for vcf file, use vcftools view chr1:0-x too.
I have put the samtools and vcftools repo here: https://samtools.github.io/bcftools/bcftools.html#view https://www.htslib.org/doc/samtools-view.html
Thank you very much for your help train.gz I made specific operations in the files according to the suggestions you gave me but I kept the first chromosome in the gtf file. The reference genome I used was this version of the pig genome downloaded from ensemble:
Hey sorry for the delay, but I still need a week or so to actually look into this issue. I would suspect that there are some reads only have 6ma or 5hmc but does not have 5mc on it so the bug exists. I don't have a huge amount of time to debug this right now but will look into it next week. Thanks!
Question in this vane...can Methphaser process C+h tag info, or would this also cause a hang-up or poor phasing?
MethPhaser ignores C+h because it is not very accurate in some basecaller versions. As long as you have C+m MethPhaser can do phasing.
Hi, As long as I have your attention, I've noticed that Methphaser works better and worse depending on some use cases. I'm assuminng this most likely has to do with the underlying heterozygosity in methylation between the alleles. Do you have any information on which regions here work better or worse? Best regards, Dvid
On Mon, 5 Aug 2024 at 18:04, Yilei Fu @.***> wrote:
MethPhaser ignores C+h because it is not very accurate in some basecaller versions. As long as you have C+m MethPhaser can do phasing.
— Reply to this email directly, view it on GitHub https://github.com/treangenlab/methphaser/issues/22#issuecomment-2269419159, or unsubscribe https://github.com/notifications/unsubscribe-auth/A7KF4B3LDRMDNUN62O7V333ZP6PCPAVCNFSM6AAAAABK4SWL4WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRZGQYTSMJVHE . You are receiving this because you commented.Message ID: @.***>
Sorry this depends. First the landscape of human genome methylation is still not fully revealed. This would require a large population scale analysis to discover so I cannot tell you which region usually has more heterozygosity and which region does not. On the other hand, the input sample type also affects a lot. For example we included a blood sample which has shitty ONT reads in our paper, and it shows that the improvement is not huge because the SNP phasing is already shitty. Happy to chat more if you like.
I think you can understand this as: when you have a reasonable SNP phased genome and the gap is not too large, MethPhaser can come in and help
Hi, Sure I have a bunch of questions, would love to learn more about the program and maybe some more insights you have.
Long time no see. I have been testing other processes recently, so I am very sorry for not communicating with you in time.May I ask whether this process has been successfully run out with the data I gave
hello, I got an error when I used mathphaser: /share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/meth_phaser_parallel:235: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass
main(sys.argv[1:])
File "/share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/methphasing", line 1437, in main
) = get_assignment_max(
File "/share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/methphasing", line 898, in get_assignment_max
base_modification_list = get_base_modification_dictionary( # build the dictionary with snp phased reads
File "/share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/methphasing", line 243, in get_base_modification_dictionary
for i in mm[methylation_identifier]: # Remora only output one type of score: c 1 m/c 0 m, but this part can be improved for other methlyation callers
KeyError: ('C', 1, 'm')
My run command is:meth_phaser_parallel -b sample.whatshap.haplotagged.bam -r ref.fa -g sample.phased.gtf -vc sample.haplotype.phased.VCF -o path/to/output
why this happened?
Any help to overcome this is appreciated!
(name,)
instead ofname
to silence this warning. phased_df_chr.get_group(chromosome).iterrows() /share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/methphasing:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html from pkg_resources import require Traceback (most recent call last): File "/share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/methphasing", line 1471, inBest