treangenlab / methphaser

MethPhaser: methylation-based haplotype phasing of human genomes
https://www.nature.com/articles/s41467-024-49588-0
MIT License
42 stars 1 forks source link

KeyError: ('C', 1, 'm') #22

Open Wshengquan opened 3 months ago

Wshengquan commented 3 months ago

hello, I got an error when I used mathphaser: /share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/meth_phaser_parallel:235: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass (name,) instead of name to silence this warning. phased_df_chr.get_group(chromosome).iterrows() /share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/methphasing:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html from pkg_resources import require Traceback (most recent call last): File "/share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/methphasing", line 1471, in main(sys.argv[1:]) File "/share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/methphasing", line 1437, in main ) = get_assignment_max( File "/share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/methphasing", line 898, in get_assignment_max base_modification_list = get_base_modification_dictionary( # build the dictionary with snp phased reads File "/share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/methphasing", line 243, in get_base_modification_dictionary for i in mm[methylation_identifier]: # Remora only output one type of score: c 1 m/c 0 m, but this part can be improved for other methlyation callers KeyError: ('C', 1, 'm') My run command is:meth_phaser_parallel -b sample.whatshap.haplotagged.bam -r ref.fa -g sample.phased.gtf -vc sample.haplotype.phased.VCF -o path/to/output why this happened? Any help to overcome this is appreciated!

Best

Fu-Yilei commented 3 months ago

Could you please attach one read from your bam file? I am suspecting there are some bugs with the methylation signal reading. Thanks!

Get Outlook for iOShttps://aka.ms/o0ukef


From: Wshengquan @.> Sent: Monday, July 15, 2024 8:44:03 AM To: treangenlab/methphaser @.> Cc: Subscribed @.***> Subject: [treangenlab/methphaser] KeyError: ('C', 1, 'm') (Issue #22)

hello, I got an error when I used mathphaser: /share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/meth_phaser_parallel:235: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass (name,) instead of name to silence this warning. phased_df_chr.get_group(chromosome).iterrows() /share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/methphasing:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html from pkg_resources import require Traceback (most recent call last): File "/share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/methphasing", line 1471, in main(sys.argv[1:]) File "/share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/methphasing", line 1437, in main ) = get_assignment_max( File "/share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/methphasing", line 898, in get_assignment_max base_modification_list = get_base_modification_dictionary( # build the dictionary with snp phased reads File "/share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/methphasing", line 243, in get_base_modification_dictionary for i in mm[methylation_identifier]: # Remora only output one type of score: c 1 m/c 0 m, but this part can be improved for other methlyation callers KeyError: ('C', 1, 'm') My run command is:meth_phaser_parallel -b sample.whatshap.haplotagged.bam -r ref.fa -g sample.phased.gtf -vc sample.haplotype.phased.VCF -o path/to/output why this happened? Any help to overcome this is appreciated!

Best

— Reply to this email directly, view it on GitHubhttps://github.com/treangenlab/methphaser/issues/22, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADDV4HSJUIW34QP42SJNL7DZMPG2HAVCNFSM6AAAAABK4SWL4WVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQYDQNZYGU2TQOA. You are receiving this because you are subscribed to this thread.Message ID: @.***>

Wshengquan commented 3 months ago

Thanks for your reply. Here's one of my reads: 8e9d05a0-9447-4b61-9600-1d9e92384702 0 1 1 60 1513S328M2I1M1I255M2D501M4D2M1I41M1I232M2D841M3S * 0 0 GGGAACTCCTCCTTTTTTTTTCCTAAGAAATTATCTAAATAATAAATTTTGTTGAAGTGTTTGTC TAGGATTTCAAGGTAACGGCCTAGAAGTCACCATCTAACCACTGTGGCAAATCCGTATGAGTGACATTAGAAGAAGTAATCTTTTGTCTATCATTTTTAATTTTATTAAGTTGAATACATAATTTTACAATTTTATAGTCGCTAATGACTAATTGGTTAATTATCCAGCCAATGAAACTAGTAGAGGGAGTTTAGGAAATGCAGAATAATTAGGCATTTCCTAGGACTTGTGTAAATTAGAATATGACGGTAGCATTTTTATGGTTGTAAAATCAGAGGGAAGGCATATATATAGCTGTTGTAGTTTTTGTTTTTGTTTTTGTTTTTTTTTTAATGGCTACACCCACGGCATGTGGAAGTGTCCCTGGTTCCTGGTTTCTGGGCCAGGGATTGAATCCAAGCCACAGCTGCAGCAATACCAGATCCTTTAACCCACCGCACTGGGCTGGTGATTGAACCCTGACCTACACAGCAACCCAAGCTGCTGCAGTTGAAGTCTTAATCCGCTGTGTCGCAGCAGGAATTCCCATATAGCAGTTTTTAAAACAGTTAAAACTATCTATTTTGTCAAACAGTCATTTTGATGAGACATATTTTATGAATTTTTTCGTATGTAACTAAATATCTGATATTAATACTTTAGAGTTGTTTGAAAGTAATTTTTTGCTTTATATCATGTAACCTAGTAACTGAAGCCATTGCATATATATAAATGGCAGTACATTAATATTGTTTCTGAGAGCCCATTGGAAACAGGAGGTCACTTCACTATCTGTTGGTATGCTTGTAATCTGATTTATTAATTCCTTCTGGAGGTTGAAAAGGGTTATCAGTTGATTTGCCTAAAAAAATCATTAATAAAATTTACAGTTAAAGAAAATTTTTGTAACTTCTGTCTTCACTTGTTGGAATGTGTGTGAAAGAACACACAGATCTTGGTCCAGCTGCCCTAGTAGCCACAGTTTACTTGTCGGCTCCCTGTCCAAGAGTACACATGCCACATGGCCATGGAAGGGAATCCTGTATAGAAACAATGACAGGTATCTGGGTGGCTGTCATTGTCTAATTCTCCTGACAGAATCAGCAGTCTGAGTGACTCATTAACCGTTGTCTCTGACTGTTTCAGCGATTTGCTGCTGTAATCATGAGAATAAGAGAAACCCGCAGACACCGCACTAAATATCAGCTCTGGGAAAATGGTGTGCACGGGAGCCAAGAGGTGGGTCTAAAAGGGTTCATTGTCCTAGTCTGTGCGTTAGGAAAGAAAAGGCTGTATGTCGTGGTTTCTAGGTATTAGGTTACTACTGTGATACATTGTCATTGTTGCTTTCTGTTAAGGTGGTTGATTTTCATCACTGCAGACAGCAGTAACTTCATTTTCTTAAAATCAGGCATGAGTAAGGATGTGTGTTATCATCTGATTTCCATATAGTTGAGCGTGATTATGTGCTTAATTTTTGTCATTTCTCACCCCTGCTCTTGAGAGCTTTTGTTGATAATGTTGTTATTGCTTTCATTCTGCTTTTATTTTGTAAGCCCTGCACTCATTCATCGCTGTACCCGAATATGAGGTAAGGAGTGGTAAAGAAAGACTGGACATAAAAGAGGAATTAGCATTTGCACTCTTCAGATATAAATGCCATCAGTATTTTCCTATTAAAATGAAGCTTGTTTTCATCTCAGTGGAAATCTGTGGCTAAAGTACAACAATAGTAATGATAATGGTGAGGCTGTTGTACTTCACATCTATAAAATCTTGCATCAATAATTTGATTAACCAGATTCCTTTGGGTAGGCCTACGTTTTCTGTCAGAGACACAGGAATACTTTATAAATAAAATTGTTAATGTCTGTTGATCTTTTTTCATTGGAAGAGGGTGACCAGTTTACCTTTTGAAAAAAAACTTTCCTAATTTGGGCTTTTTTTTTTTTCTCCTTTTTAGGGCTGTACCCATGGCATATGAAAGTTCCTGTGCTAAGGGTTGATCAGAGCTGCAGCTGCCAGGTTACGCTACAGCAACACCAGATCAGGTTGTCTGTGGCCTTTGCCATAGCTTGGGGCAGCACCGGATCCTTAACCCACTGAGTGAGGCCAGGGATTGAACCTGCATCCTCCTGGATACTAGTTGGGTTCTTAACCTGCTGAGCCACAATGGGAACTCCTGGGCTTTTTATAAGTTATACGTTAAATAATTATTTTAGCTGTCTTTGAGTATGAATATCTCACTTTTTCTTTCCTTAGTGAAGAACAGTCCAGACTAGCAGCAAGAAAATATGCCAGAGTTGTACAGAAGTTGGGTTTTCCAGCTAAATTCTTGGACTTCAAGATTCAGAACATGGTGGGGAGCTGTGATGTGAAGTTCCCTATAAGGTTAGAAGGCCTTGTGCTTACCCACCAACAGTTCAGTAGGTAAGTCTGAAATGGATTGTGATTGCTTTTGGCAACAATTAATTTATAACCTATTTAAACACTGTTCATGATTTTTAAAAAACATGCAAAGTAATTGGTATATGAAATCAAATTATTTTGGTTTTTTCATCTTCAGGACCATAGCAGTGGCATATGGAAGTTTCCGGGCCAGGGGTCAAATCAGAGCTGCAACTGCCAACCTCCACCACGGCCACAGCAGTGCCAGGTCCCAGCTATGTCTGTGATTTACATCGCAGCTCAGGGCGAAACCAGATCCTTAACCCACTGAGCAGGGCCAGGGATCGAACCTGAATCCTCACTGATACAGTTTTGTTACCACTGAGCCACCATAGGAGCTCCCAAATGATTCACATATAGATGTTTTACTATTGAAATTTCTCCCATTCCTACCATTCTCACTGGTCTTCTACTTCTTAATGATACCCCAGACTCCCCTTTACTAGGATGAAATTGGTTCCCCTTCTGTATGTTTCATCGTTTCATTGCTCAATGAATACATTTCAAATGCATGAATTAGAAGTTCCTGTTATGGTGGATATGACATTTCTTTCTCTTTGTTTTCCCTGCTGCTCTCCTGGTGAAGACTTTAAAGGCAGGCTTGCCCATCTCTGCAGACCCTGCTTGCAGCATGCGCCTGGCGTGCTGTGCCCTTAATTGCACCAGGGCCTTTGCACATCCCGCTTTTATGCCTGGTGCTCTTGTTTCACTCTTCACCTAGGAAACTCCCACCGGCATTTCATGTATCCATCCAGGCATCTTTTCTTCAGAGAAGCCTGTCTTGAATTTTTCACTGGGCTAAACACACACACACCTAACTATTTTCAGTCTCTCTAACTAACAAAATGAGATGACTAGGATTAGAAAAAAACATACCAGTAGTGCATTTGGTCTGACAACACAGCGAAAGGTTCATTGTCTAACCAGTATCTTTTCTATCACTTTTGGTTAAGTCGCCTGACCTAGGTTAACACTTTGAGGATATTTCAGTTTAAGGAGATGAGATGTGAATGATTAGAAGGTAAACTGTGTGACTGTCATATCTTAGACATAAGTAATTCATTAGGCTCTTGTAAATCAGTGTACTTCACTTGTCCAGAGTGAAACTTGATGAGGGGCAGAGACTACAGAACATATTTATAAAGCTCTTGTTCTCATGTCTGGAAATTCAGATTCATTAGAAGTAAGAGTTCGTTGGCCTTCAGGTGCCAATTACAAGTCAGCTTGAG ISQKMJSFEB===:;AABFJECHEFHMHJNGQSQMJMMSHISJKMSCKSSSS9557JFJJEJFSFJLLGSSIHLIJHJSKIMSSSLKKSJPIILOSKLKIHNIHIKSQJSHSNOLIGNSLLJSKPJIIIO>LHA@BDNJLKIJKSMJSLLSSMMISMGJIILSOOKHSSLSRNSJHINMSSJSMIPLJSSJJSGJSKSIPIMSISISMKKIPOHKKHSIPJOHNNKKHKIMJLHMOJPJSKMIKQLHIHHSQIJHSNLKKHJNOSJJSMJSSJKKSLEOJIJHISSHPIJLMLSIIMSPQSJSIJSLIJSHJRKKJLIGE@88=;0///GNISJKSLFGISSISJJSNSKKHJKOJIF::9989ABCEKHEISLPSSSIHGKFJRGI@>;AEISSSSHGEESSSKOKIMSISKHJHHSSSKPSLGLISSJEHSQPLGLNKOSJIGECED@A;;:99:44459CEDBFJJHKJSGHFSLSJHSSSOSMJSMHSQSLGGMJJISNIIGLKPJSSSILKICEKIHJJSJHSKNSKKII=<;8((((((///0@IFICB;788956,&&&&&+*,//3333336JSJKNKGGSDDDSINSMKHSKSSSLKJSJQKHG?<=J;;/////;>.BHEHJLQJJNLGSMNIGKNSSJSISJSSISSQISHPMPSSINJSLSRSLJHISHSSISSKNSSISOSIHSISSISSQGJSSPRSHHNLJKKKSKPSSSIKCSFDEJGHOJIFMSJLJHSLISKSJSSISJSHKILHOLSPKHFFSLHSNSMKSKJJOSLSPS(((((GKKJOSLQHSSHJSISKSOMOKEH;;::;FSINPMLILSMOSSKRJRMSSMKIPSSLSISSSSGHGKJHMOJIMSHNHIHSONSSSMSSOKHSJFLNLSSOMSKHKKSSEEJKSSSSSSSSGJSKJSOSSMKSJMIFHSGSKSLSJNSKJSSHHMLIHRMMHSKHRGIJHSLLSSQHSLIISJB99:?GLEJESIPSMSSH>:;:;JRFOSLIOKFSKKKSSHOSJSIJS>==<:<9;;ACGMF@AEPKFHSKNSNERSSOJGKKSMSJHRKLHHNOHKKSSSPSJKKMIRJHKMLHSNSSSSSISSQS===<JSSSNRSGMJKHMGJEBDCG>===MSQMNJNSPNSHGSGHGLLFGMLSFIILJNKSNRHMKKKKIIKKJKKJSLSJIIGLLIHD410,++)'&&'''%%%&&&&(((&&'(&'($$$%'(()3457@>.,,,,1&&&((-+++@BIRKSISIPSSMKHJISSJLE?@BBAGSKSJSPSKIPHSKSSSSNSSHHMSJGDCCJMSSSLNSSMLHKL SMSJJMKSSKOMNSSHSSMMLSMJKSSISGSJOJOFIN>>==>SJSKSSKISSISSKNISSLHFSSSSRHGISIIGHGN@@CCBGOKGFHIKHKJISHRHJSSKSHSSKISIHRHKSKSGEJNMSJHSJNNSJNSGJHSJJSSHHSMOSSSHSSJSJHSPSLSGSKSNKNNGGHLGHSNPSJNPKJOSSGJNSSSKKHSLSSSLJSSILKSIGNSSSLISKSRHSILKDHIIPSSHSOSRSSMLMMOKLIHGNKQMJSOFEFLBFEPOFIKHLSHSKNGSSSECLHJJHLSSMSIKSSSSSSSSHHINMSJGEKIKSIJB><<<==:4433/00HSIIGEJHKJGSSSSJSSSISGLJSGIRLNILSMHHJGSLHIJSNINGSSMJJHHHJDDFHLNQIKLSKSMKKSSMOKMLJIMJIJSSHSMSOLSKSNSKSSSGOHIPSSHGISLSJKJSOLOJOGKNSMPKSSFSEQKJKISPJLOSSMHLJJSGIJC>@,,--.HGEGHGB80--,++(&'()((()BFADGR<;;;<CECDMMSIJSSKSSKKSSHILSSQLSNA@AA@HHSNPIKLJSSSSHSKSLSFNKEGFMF:<20GJSFEHGKSKGNLSLSSLSLIJHQSISHMB76S:9989JSSSKFIJESKMSLHISHKSGS8GEIGSFSSGIEAA55=?@BJIKSQLRLGOHKHJIIHONHSSSNSLLJGSLKLGSSGME<;100004100000EIELIJGCNIHMMHHMSSJHI?:98,,)((%%%$$%''&),-2367622658EIFCABB?8889ESNSKSLGFIJFDJFEGGJSLSSIQSHSFGHSIQSSJFDDCCESPSKMPSJILJKHLNSHJHMHKJIFHJ FCDCDOPSIHOJJSMLJSSHKIIKIPGMIKSSM?>>?H;;<<<EFFGSQBCLIKFHLGSISISPJSRLSKSSQIHSSNJDMNLKSGHKFGISKSPSKILSKNSHSDHHSHA@@JSISSSGJOJRSQJHHKHSSISGLSHKSGISSJMSJGKNOSIISSNKSJSLKLIH@NRIENINESIJLJKHSNISSHSISEBFEFNSHMHSLOKHLJSOSIIIS JSPIJIMSLSKKSILSPSQHKMSHMSKGKLKLIKKSSHHHIKGILIEQMLOSHSNGHFFHHPSDRKEMHLKSSSSSHMRSJGSSSJHQMSSMMSSJJHOKSLKJPSIQKFIJMJLSMKNSLLSIOSKSJSSSJJCF>;;@AASSMPSSJSSNNNSSSKHSHKSJSJRHPHLSKKHKHFCODJHHEGECB)&&&&((112GOKIKHFGSSLHHKSSNISLI@MJSSFHFLSSLNLMJJSSJNS??>??SSKSQHIKLPSIGJSJBA642*222>??AFKFHSIILGQSHILSMLIJKJILJJOSHSSKIHHRGJSPSSFSGIJSL;:981023356AADCCCEHSHHOSJHQISLPKMFHH:4554448=@GNSNJMILHSKJKKMSMSNKIJISKJSKS<<1111364311BEEESNSSJSMLLSNHILSLJSJNJKJMJJHKOSLHIIFFEHEFLLSNG;40()3767@ADCSSHSGSSPSHMSQJOIAFC?>>>>A33333;@>2?:9:69<?<GH56DBBDEGSLGPJSMSFSHRSJGSIGMSPSSIOJIIGSKSNKSKSHPSKLSSSSJSHIJSS?D-----SSNKLHMSFSILSHOSLSKIIJIIKSSSJISIHKLLSJISKPQISKKSLGAISKGFEB7D@? ??KIGHFAAJLSJJGSMFSPFMISKSJSMLIJKFKJSSSILMMHIGGSPSKKHSSRSJSLNKPOSLNSJONQIISRLISKNJHQIJGQSLIISNSHPSHSMSJKSGSOMRSOISSJSLSJGLKHSLSHIKSMSJSLSGKHJRKKGJKFKHIILJIB@90/42//1932222344444:;<CJFKSILLMFJSHIEE==CCSSJLHSGHIJSJLSJSSISSSSNJNGSKGIFSJSSIKPLKJSSPSJSHRSJISSJSSJSHSKIHMSSSQJHJSSGGSSHSKSRIIJKSJKME))*477>BBCJGSSPHKLSLKJPIKSSSSOHMSHKSGSISOSSSSPSGILFHIGEJ=<;;;KSPHSGNLSKGFGFIHSSJJSJMHRSSLMPKKKOSJHNHOMLSQNLLGKQSSSJFSHSISFNSQSIHJHSSQILNSSIHJISSIHFJHSISGNFGSLGKNOIKHQJSHSQSHRIISJNSGHSIGNJSSKLSSOJGSSLJOSLSISPHJHHJMIIKJHSMSLLNSJKSPHJSSSRISLKJSSHLMSJKSJILHJKSIJSFGDDCDBBCSIKSFFGDACHSSMJSSSKLSRKFKIJHGFDPQESSJHISSSLKJJIPKJJGSSJSGKHGF>=DJISSSSIJSKSKJQISMSH=;:98888 755.)&% qs:i:21 du:f:9.5696 ns:i:47848 ts:i:592 mx:i:3 ch:i:843 st:Z:2024-04-08T11:56:24.693+00:00 rn:i:5741 fn:Z:PAW37422_pass_b74e10a2_68ebe454_253.pod5 sm:f:100.808 sd:f :28.356 sv:Z:quantile dx:i:0 RG:Z:68ebe454a63ea33cf655142d882767dfb3012d4a_dna_r10.4.1_e8.2_400bps_sup@v4.2.0 MN:i:3722 MM:Z:A+a.,0,17,2,6,3,0,0,3,12,0,12,2,13,1,9,5,0,3,12,8,0,0,0,0,0,0,0,0,2,7,0,0,0,2,0,0,3,0,0,0,2,9,63,0,0,0,3,0,0,47,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,13,0,0,0,1,0,0,6,2,0,0,0,0,0,0,0,0,0,0,0,0,0,8,0,0,4,0,3,1,1,0,17,0,0,0,50,0,1,23,39,0,0,43,1,3,0,0,0,0,4,0,0,0,0,0,0,0,0,4,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,1,12,1,14,0,0,0,3,4,0,0,0,0,0,0,0,3,2,0,3,6,0,39,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,2,1,0,0,1,0,0,2,2,8,0,17,4,0,0,0,12,14,0,1,0,0,0,4,0,0,0,1,0,0,0,1,2,5,0,0,1,33,12,0,0,12,0,1,41,1,0,6,1,1,4,8,0,0,0,0,0,0,0,0,0,0,0;C+h.,9,0,2,3,0,2,0,6,3,2,2,0,1,1,7,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,5,3,1,0,0,0,0,0,3,0,0,0,0,0,1,1,0,0,1,11,14,0,0,1,0,0,0,13,0,0,0,0,0,0,0,0,0,1,0,0,2,0,0,0,2,0,0,0,0,0,0,0,0,4,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,2,3,0,0,0,0,0,0,0,0,1,0,2,2,0,0,0,5,6,4,0,0,0,0,0,0,8,1,0,1,0,0,16,4,6,0,1,0,1,0,0,1,2,0,0,0,9,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,8,0,0,5,2,3,0,0,4,1,0,0,0,0,0,1,12,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,0,0,0,0,22,0,8,3,0,0,3,1,0,4,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,5,1,0,0,1,2,2,8,16,0,0,0,0,0,0,7,1,1,4,9,2,0,0,0,8,1,0,0,0,0,0;C+m.,9,0,2,3,0,2,0,6,3,2,2,0,1,1,7,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,5,3,1,0,0,0,0,0,3,0,0,0,0,0,1,1,0,0,1,11,14,0,0,1,0,0,0,13,0,0,0,0,0,0,0,0,0,1,0,0,2,0,0,0,2,0,0,0,0,0,0,0,0,4,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,2,3,0,0,0,0,0,0,0,0,1,0,2,2,0,0,0,5,6,4,0,0,0,0,0,0,8,1,0,1,0,0,16,4,6,0,1,0,1,0,0,1,2,0,0,0,9,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,8,0,0,5,2,3,0,0,4,1,0,0,0,0,0,1,12,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,0,0,0,0,22,0,8,3,0,0,3,1,0,4,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,5,1,0,0,1,2,2,8,16,0,0,0,0,0,0,7,1,1,4,9,2,0,0,0,8,1,0,0,0,0,0; ML:B:C,14,13,53,12,35,21,16,14,112,191,38,66,13,12,12,39,111,24,13,22,80,72,66,148,40,35,28,14,28,35,40,29,16,13,26,251,29,241,106,70,12,171,107,16,21,12,12,16,18,12,12,17,15,12,17,22,13,14,22,25,46,18,71,17,24,17,23,17,48,20,25,15,20,15,78,20,139,100,80,74,146,146,112,125,167,117,108,87,25,51,12,38,46,12,15,13,12,111,18,21,20,19,14,12,31,26,30,41,59,39,17,12,13,56,77,45,31,14,53,29,222,132,113,74,21,97,14,14,18,17,22,25,34,16,68,14,210,38,14,25,16,159,12,13,102,13,15,17,18,34,126,38,22,33,25,19,39,93,47,76,15,21,15,39,14,15,14,32,21,35,43,62,49,51,64,36,36,15,21,62,45,255,255,29,42,38,40,17,227,32,37,29,15,226,91,21,156,42,85,112,128,14,132,14,88,206,12,14,13,16,15,15,40,38,53,51,53,63,30,26,87,12,159,14,14,13,90,21,12,14,24,63,22,13,14,12,20,112,133,55,43,108,31,138,13,16,17,59,29,182,18,15,8,0,3,8,3,1,1,3,4,2,5,5,16,8,2,9,28,40,33,25,33,18,30,125,7,51,6,2,2,2,2,1,4,5,4,4,3,2,4,8,15,100,9,8,12,12,147,19,13,30,6,7,5,2,9,16,15,12,9,6,5,4,2,3,3,11,5,6,6,3,14,255,30,10,7,22,5,8,11,6,6,4,6,8,10,6,7,6,5,5,5,3,4,0,0,7,11,9,7,0,5,2,29,27,71,47,3,7,2,2,1,37,78,28,7,8,17,18,1,11,9,4,11,7,9,5,2,1,2,3,3,2,2,2,6,3,44,9,23,39,1,3,9,7,18,6,7,8,7,4,21,8,13,7,6,6,4,47,6,7,5,9,15,19,34,22,11,11,0,5,4,6,13,130,57,230,22,11,11,21,9,12,21,27,86,5,5,7,8,10,10,14,16,5,7,5,4,7,7,9,7,8,3,3,4,11,5,7,1,24,5,4,8,7,7,9,49,7,6,6,4,3,4,2,8,2,5,5,8,11,12,3,10,6,17,11,2,12,3,2,2,5,5,2,6,4,20,24,9,16,13,10,11,13,12,14,4,12,4,2,2,25,28,3,4,6,3,4,2,6,11,10,13,11,10,10,3,3,7,7,3,4,2,0,1,3,12,1,3,2,15,6,4,10,13,8,14,11,9,4,4,9,15,14,8,10,7,3,4,4,5,20,15,16,4,18,1,8,8,11,12,5,4,6,3,3,20,8,7,8,13,4,3,11,7,3,5,13,66,13,2,3,3,7,7,9,12,4,3,1,3,2,106,92,5,5,17,10,2,39,21,8,10,9,5,63,7,16,255,13,24,12,37,254,252,22,12,13,13,240,13,254,16,36,40,40,39,52,50,24,24,20,79,19,13,14,12,12,254,16,12,13,14,15,13,14,22,9,7,18,17,14,240,36,235,17,225,12,16,18,12,22,38,16,22,17,13,16,16,13,17,16,22,22,18,28,12,14,0,6,12,23,46,18,22,20,18,19,12,21,16,18,15,20,31,13,18,15,13,18,13,255,16,17,18,200,255,19,18,43,65,137,90,25,37,253,12,254,55,38,37,18,12,15,26,254,244,12,12,21,17,20,14,42,254,12,14,14,14,12,12,14,12,27,246,13,11,254,12,15,19,100,18,16,247,22,13,26,12,13,14,13,12,13,14,16,17,13,14,27,36,42,44,29,33,255,14,25,232,85,94,110,8,70,26,29,59,31,4,234,51,44,18,18,21,28,27,28,26,27,17,20,15,14,17,19,17,14,20,13,15,15,24,14,16,254,12,15,12,16,18,16,19,50,13,22,21,15,13,14,12,12,45,18,16,20,27,13,252,20,19,41,29,14,12,14,13,16,21,21,12,24,19,224,55,35,26,30,34,26,31,24,28,13,212,251,22,31,229,76,246,28,23,19,20,12,24,36,23,23,244,23,25,17,17,21,20,18,16,13,17,14,16,47,254,14,16,11,12,12,21,24,16,24,26,23,16,15,58,26,23,20,28,26,16,14,13,17,31,29,29,251,79,254,21,16,20,28,20,19,22,23,28,11,12,248,15,25,12,12,12,21,14,20,14,189,10,14,13,24,28,14,16,21,15,13,254,22,253,15,2,13,16,30,19,12,29,234,16,24,29,22,22,30 NM:i:25 ms:i:4282 AS:i:4276 nn:i:0 de:f:0.00860507 tp:A:P cm:i:350 s1:i:1968 s2:i:0 MD:Z:169G154G1G154T72C23C0A4^TA501^TTGG3T77A55A38A98^TG119T721 rl:i:400 SA:Z:1,5438,-,2207S1514M1I,60,15; Is there an error because the label of methylation recorded in my bam file is MM instead of mm

Fu-Yilei commented 3 months ago

I see. The read you are showing here only has 6mA calls, but MethPhaser can only phase with 5mC calls.

Fu-Yilei commented 3 months ago

MM:Z:A+a? is for 6mA calls. C+m? is for 5mC.

Wshengquan commented 3 months ago

MM:Z:A+a.,0,17,2,6,3,0,0,3,12,0,12,2,13,1,9,5,0,3,12,8,0,0,0,0,0,0,0,0,2,7,0,0,0,2,0,0,3,0,0,0,2,9,63,0,0,0,3,0,0,47,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,13,0,0,0,1,0,0,6,2,0,0,0,0,0,0,0,0,0,0,0,0,0,8,0,0,4,0,3,1,1,0,17,0,0,0,50,0,1,23,39,0,0,43,1,3,0,0,0,0,4,0,0,0,0,0,0,0,0,4,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,1,12,1,14,0,0,0,3,4,0,0,0,0,0,0,0,3,2,0,3,6,0,39,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,2,1,0,0,1,0,0,2,2,8,0,17,4,0,0,0,12,14,0,1,0,0,0,4,0,0,0,1,0,0,0,1,2,5,0,0,1,33,12,0,0,12,0,1,41,1,0,6,1,1,4,8,0,0,0,0,0,0,0,0,0,0,0;C+h.,9,0,2,3,0,2,0,6,3,2,2,0,1,1,7,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,5,3,1,0,0,0,0,0,3,0,0,0,0,0,1,1,0,0,1,11,14,0,0,1,0,0,0,13,0,0,0,0,0,0,0,0,0,1,0,0,2,0,0,0,2,0,0,0,0,0,0,0,0,4,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,2,3,0,0,0,0,0,0,0,0,1,0,2,2,0,0,0,5,6,4,0,0,0,0,0,0,8,1,0,1,0,0,16,4,6,0,1,0,1,0,0,1,2,0,0,0,9,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,8,0,0,5,2,3,0,0,4,1,0,0,0,0,0,1,12,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,0,0,0,0,22,0,8,3,0,0,3,1,0,4,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,5,1,0,0,1,2,2,8,16,0,0,0,0,0,0,7,1,1,4,9,2,0,0,0,8,1,0,0,0,0,0;C+m.,9,0,2,3,0,2,0,6,3,2,2,0,1,1,7,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,5,3,1,0,0,0,0,0,3,0,0,0,0,0,1,1,0,0,1,11,14,0,0,1,0,0,0,13,0,0,0,0,0,0,0,0,0,1,0,0,2,0,0,0,2,0,0,0,0,0,0,0,0,4,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,2,3,0,0,0,0,0,0,0,0,1,0,2,2,0,0,0,5,6,4,0,0,0,0,0,0,8,1,0,1,0,0,16,4,6,0,1,0,1,0,0,1,2,0,0,0,9,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,8,0,0,5,2,3,0,0,4,1,0,0,0,0,0,1,12,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,0,0,0,0,22,0,8,3,0,0,3,1,0,4,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,5,1,0,0,1,2,2,8,16,0,0,0,0,0,0,7,1,1,4,9,2,0,0,0,8,1,0,0,0,0,0; MM has not only A+a but also C+m

Fu-Yilei commented 3 months ago

I see, yes indeed this is a bug of MethPhaser. I would suspect that there are some reads that only have 6mA calls but do not have 5mC calls. I will try to apply a filter to skip those reads.

Fu-Yilei commented 3 months ago

I've made the changes, could you please check the MethPhaser version of (https://github.com/treangenlab/methphaser/tree/Fu-Yilei-patch-1)? Or if you could provide an example BAM file I can do the test.

Wshengquan commented 3 months ago

subset.zip This is a subset of my data. Do I need to reinstall mathphaser

Fu-Yilei commented 3 months ago

yeah you need to clone the version in the branch I provided

Wshengquan commented 3 months ago

Thank you very much for your help, waiting for your test success using the subset file

Wshengquan commented 3 months ago

I used the new version you changed yesterday, but the error is still reported, but it seems to be different: /share/home/yzwl_hanxs/app/dorado-0.7.2-linux-x64/bin/meth_phaser_parallel:235: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas phased_df_chr.get_group(chromosome).iterrows() /share/home/yzwl_hanxs/app/dorado-0.7.2-linux-x64/bin/methphasing:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html from pkg_resources import require Traceback (most recent call last): File "/share/home/yzwl_hanxs/app/dorado-0.7.2-linux-x64/bin/methphasing", line 1477, in main(sys.argv[1:]) File "/share/home/yzwl_hanxs/app/dorado-0.7.2-linux-x64/bin/methphasing", line 1443, in main ) = get_assignment_max( ^^^^^^^^^^^^^^^^^^^ File "/share/home/yzwl_hanxs/app/dorado-0.7.2-linux-x64/bin/methphasing", line 910, in get_assignment_max assignment_df = get_base_modification_list_snp_block( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/share/home/yzwl_hanxs/app/dorado-0.7.2-linux-x64/bin/methphasing", line 605, in get_base_modification_list_snp_block for i in mm[methylation_identifier]: ~~^^^^^^^^^^^^^^^^^^^^^^^^ KeyError: ('C', 1, 'm')

Fu-Yilei commented 3 months ago

sorry forgot to change one spot in the code, now should be fine :)

Wshengquan commented 3 months ago

I tried again, but I got a mistake: /share/home/yzwl_hanxs/app/dorado-0.7.2-linux-x64/bin/meth_phaser_parallel:235: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass (name,) instead of name to silence this warning. phased_df_chr.get_group(chromosome).iterrows() /share/home/yzwl_hanxs/app/dorado-0.7.2-linux-x64/bin/methphasing:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html from pkg_resources import require Traceback (most recent call last): File "/share/home/yzwl_hanxs/app/dorado-0.7.2-linux-x64/bin/methphasing", line 1471, in main(sys.argv[1:]) File "/share/home/yzwl_hanxs/app/dorado-0.7.2-linux-x64/bin/methphasing", line 1437, in main ) = get_assignment_max( ^^^^^^^^^^^^^^^^^^^ File "/share/home/yzwl_hanxs/app/dorado-0.7.2-linux-x64/bin/methphasing", line 898, in get_assignment_max base_modification_list = get_base_modification_dictionary( # build the dictionary with snp phased reads ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/share/home/yzwl_hanxs/app/dorado-0.7.2-linux-x64/bin/methphasing", line 243, in get_base_modification_dictionary for i in mm[methylation_identifier]: # Remora only output one type of score: c 1 m/c 0 m, but this part can be improved for other methlyation callers ~~^^^^^^^^^^^^^^^^^^^^^^^^ KeyError: ('C', 1, 'm')

Fu-Yilei commented 3 months ago

Sorry about the back and forth. Was this the program from this patch? https://github.com/treangenlab/methphaser/tree/Fu-Yilei-patch-1 Based on the line number of the bug, it seems like you are using the main branch. Or you can send me the input you are using for this program, it is hard to debug with only subsampled reads.

Wshengquan commented 3 months ago

I did reinstall it from here : https://github.com/treangenlab/methphaser/tree/Fu-Yilei-patch-1 My input file takes up too much memory and cannot be uploaded to you via github, even after compression. How should I give you my input file

Fu-Yilei commented 3 months ago

you could limit your vcf, gtf and bam to the first 3 phaseblock on chr1. I think those could be sufficient. Thanks.

Wshengquan commented 3 months ago

“you could limit your vcf, gtf and bam to the first 3 phaseblock on chr1. I think those could be sufficient.” I'm sorry, how do I do this? This is my first contact with haplotype related knowledge, I really do not know how to do

Fu-Yilei commented 3 months ago

Hey no worries! For gtf file, keep only first 3 lines. For bam file, use samtools with samtools view function with specifying region chr1:0-x (x = the phaseblock end of the 3rd phaseblock) for vcf file, use vcftools view chr1:0-x too.

I have put the samtools and vcftools repo here: https://samtools.github.io/bcftools/bcftools.html#view https://www.htslib.org/doc/samtools-view.html

Wshengquan commented 3 months ago

Thank you very much for your help train.gz I made specific operations in the files according to the suggestions you gave me but I kept the first chromosome in the gtf file. The reference genome I used was this version of the pig genome downloaded from ensemble: %)AP1W2UV@FY~H1SD$Y42V4

Fu-Yilei commented 3 months ago

Hey sorry for the delay, but I still need a week or so to actually look into this issue. I would suspect that there are some reads only have 6ma or 5hmc but does not have 5mc on it so the bug exists. I don't have a huge amount of time to debug this right now but will look into it next week. Thanks!

DHmeduni commented 3 months ago

Question in this vane...can Methphaser process C+h tag info, or would this also cause a hang-up or poor phasing?

Fu-Yilei commented 3 months ago

MethPhaser ignores C+h because it is not very accurate in some basecaller versions. As long as you have C+m MethPhaser can do phasing.

DHmeduni commented 2 months ago

Hi, As long as I have your attention, I've noticed that Methphaser works better and worse depending on some use cases. I'm assuminng this most likely has to do with the underlying heterozygosity in methylation between the alleles. Do you have any information on which regions here work better or worse? Best regards, Dvid

On Mon, 5 Aug 2024 at 18:04, Yilei Fu @.***> wrote:

MethPhaser ignores C+h because it is not very accurate in some basecaller versions. As long as you have C+m MethPhaser can do phasing.

— Reply to this email directly, view it on GitHub https://github.com/treangenlab/methphaser/issues/22#issuecomment-2269419159, or unsubscribe https://github.com/notifications/unsubscribe-auth/A7KF4B3LDRMDNUN62O7V333ZP6PCPAVCNFSM6AAAAABK4SWL4WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRZGQYTSMJVHE . You are receiving this because you commented.Message ID: @.***>

Fu-Yilei commented 2 months ago

Sorry this depends. First the landscape of human genome methylation is still not fully revealed. This would require a large population scale analysis to discover so I cannot tell you which region usually has more heterozygosity and which region does not. On the other hand, the input sample type also affects a lot. For example we included a blood sample which has shitty ONT reads in our paper, and it shows that the improvement is not huge because the SNP phasing is already shitty. Happy to chat more if you like.

I think you can understand this as: when you have a reasonable SNP phased genome and the gap is not too large, MethPhaser can come in and help

DHmeduni commented 2 months ago

Hi, Sure I have a bunch of questions, would love to learn more about the program and maybe some more insights you have.

Wshengquan commented 2 months ago

Long time no see. I have been testing other processes recently, so I am very sorry for not communicating with you in time.May I ask whether this process has been successfully run out with the data I gave