Open Malabady opened 1 month ago
Hi, the starting tree requires only topology and fossil calibrations but no branch length. The species you put in the tree is dependent upon the phylogenetic location of your WGD such that other species in the tree should not share the WGD to be dated except for your focal species.
Thank you for the quick reply. I need a bit more clarification on how to modify this tree for my species. A couple more questions:
Are the ap1 and ap2 the upper and lower CI of the peak from the Anchor Ks distribution of Aquilegia_coerulea.tsv.ks.tsv
? If not what are they?
`` 17 1 ((((Potamogeton_acutifolius,(Spirodela_intermedia,Amorphophallus_konjac)),(Acanthochlamys_bracteata,(Dioscorea_alata,Dioscorea_rotundata))'>0.5600<1.2863')'>0.8360<1.2863',(Acorus_americanus,Acorus_tatarinowii))'>0.8360<1.2863',((((Tetracentron_sinense,Trochodendron_aralioides),(Buxus_austroyunnanensis,Buxus_sinica))'>1.1080<1.2863',(Nelumbo_nucifera,(Telopea_speciosissima,Protea_cynaroides)))'>1.1080<1.2863',(my_species_ap1,my_species_ap2))'>1.1080<1.2863')'>1.2720<2.4720';
In the above example, I added my species. Are the numbers that come after it the peak mode, lower and upper CI? if not what are they, and how I can get them for my species.
Thank you,
Let's make it clear step by step. First, have you ever used mcmctree
? Could you please show me the starting tree you used in your last usage of mcmctree
.
So far in the analysis, I only used a specie tree (4species.nwk) to do the corrected ks distribution of my species (spu.cds). the species tree is below. it includes divergence times.
((Sly.cds:113.97482000,(Spu.cds:97.85759000,Dlo.cds:97.85759000)'14':16.11723000)'13':11.14478000,Cfo.cds:125.11960000);
Here is the commands :
wgd dmd --globalmrbh Spu.cds Dlo.cds Sly.cds Cfo.cds -oi -f Spu.cds -o wgd_globalmrbh -n 12
wgd ksd wgd_globalmrbh/global_MRBH.tsv Spu.cds Dlo.cds Sly.cds Cfo.cds -o wgd_globalmrbh_ks -n 36
wgd viz -d wgd_globalmrbh_ks/global_MRBH.tsv.ks.tsv -fa Spu.cds \
-epk ../whole-panarome/wgd_ksd/Spu.cds.tsv.ks.tsv \
-ap ../whole-panarome/wgd_syn/iadhore-out/anchorpoints.txt \
-sp 4speices.nwk -o wgd_viz_mixed_Ks --plotelmm --plotapgmm --reweight --plotkde --classic
Note that for rate correction the species tree requires only the topology. Did you use mcmctree
to calculate the divergence time?
I understood that about the rate correction. but does including divergence time cause any harms, or misinterpretation by the algorithm? Do I need to redo the correction with topology-only species tree?
No, I didn't use mcmctree
to calculate divergence time. I obtained it directly from "http://www.timetree.org/". then i compered the divergence times to what is known in the literature. it seemed accurate to me,
The extra divergence time in the species tree would be simply ignored in the rate correction analysis. Thus, you don't need to redo this part of analysis. I really suggest first having a glance of the tutorial for mcmctree
before you go any deeper into molecular dating, which will help you understand better the species tree issue that's unclear to you.
Hey: would you mind keeping this open for a bit longer. I haven’t completed this part yet. I just ran into computer problems and trying to fix them first. Still haven’t figured the tree part.
Get Outlook for iOShttps://aka.ms/o0ukef
From: heche-psb @.> Sent: Wednesday, June 12, 2024 1:58:49 AM To: heche-psb/wgd @.> Cc: Magdy S Alabady @.>; Author @.> Subject: Re: [heche-psb/wgd] RE: dating tree (Issue #36)
[EXTERNAL SENDER - PROCEED CAUTIOUSLY]
Closed #36https://github.com/heche-psb/wgd/issues/36 as completed.
— Reply to this email directly, view it on GitHubhttps://github.com/heche-psb/wgd/issues/36#event-13126275362, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACHNEMAG3MB4EGY2HGRC7B3ZG7PRTAVCNFSM6AAAAABINNMAQKVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJTGEZDMMRXGUZTMMQ. You are receiving this because you authored the thread.Message ID: @.***>
Hi I am getting the following error in the dating step.
$ wgd focus --protcocdating --aamodel lg wgd_dmd_ortho/merge_focus_ap.tsv -sp dating_tree.nw -o wgd_dating \
> -d mcmctree -ds 'burnin = 2000' -ds 'sampfreq = 1000' -ds 'nsample = 20000' \
> Potamogeton_acutifolius Spirodela_intermedia Amorphophallus_konjac Acanthochlamys_bracteata Dioscorea_alata \
> Dioscorea_rotundata Acorus_americanus Acorus_tatarinowii Tetracentron_sinense Trochodendron_aralioides \
> Buxus_austroyunnanensis Buxus_sinica Nelumbo_nucifera Telopea_speciosissima Protea_cynaroides Sarracenia_purpurea
2024-06-16 12:50:03 INFO This is wgd v2.0.38 cli.py:34
INFO Checking cores and threads... core.py:35
INFO The number of logical CPUs/Hyper Threading in the system: 32 core.py:36
INFO The number of physical cores in the system: 32 core.py:37
INFO The number of actually usable CPUs in the system: 12 core.py:38
INFO Checking memory... core.py:40
INFO Total physical memory: 125.5488 GB core.py:41
INFO Available memory: 111.9655 GB core.py:42
INFO Free memory: 100.7363 GB core.py:43
2024-06-16 12:54:14 INFO tmpdir = wgdtmp_c20d4cc5-66ec-4711-b1e8-4bd3415fc045 for Potamogeton_acutifolius cli.py:240
2024-06-16 12:54:15 INFO tmpdir = wgdtmp_57b79f1f-96e9-4d65-98b9-32c6a42e3281 for Spirodela_intermedia cli.py:240
INFO tmpdir = wgdtmp_788ee945-32b9-4885-a09d-e0d7d6751eef for Amorphophallus_konjac cli.py:240
INFO tmpdir = wgdtmp_3be61e0e-9695-4ee7-aa47-5069e9b25b53 for Acanthochlamys_bracteata cli.py:240
INFO tmpdir = wgdtmp_17914199-b497-4a57-b166-59627eed526a for Dioscorea_alata cli.py:240
INFO tmpdir = wgdtmp_b2218966-243d-4527-adba-ecad777aed52 for Dioscorea_rotundata cli.py:240
INFO tmpdir = wgdtmp_18079ca0-0b92-4005-8aab-824614b1a798 for Acorus_americanus cli.py:240
INFO tmpdir = wgdtmp_78b0cd3d-26c4-439f-8476-3881570afbcf for Acorus_tatarinowii cli.py:240
INFO tmpdir = wgdtmp_33a5ea42-0bff-4f35-a28b-92837d428cb2 for Tetracentron_sinense cli.py:240
INFO tmpdir = wgdtmp_3505e2c1-c0ec-44fe-ba4b-3f6651222dd5 for Trochodendron_aralioides cli.py:240
INFO tmpdir = wgdtmp_e3e0d923-10aa-4136-8123-04b2e63a2e8a for Buxus_austroyunnanensis cli.py:240
INFO tmpdir = wgdtmp_d40b87a5-294e-433c-a007-100ce950d846 for Buxus_sinica cli.py:240
INFO tmpdir = wgdtmp_ad8f14e2-12d4-46fe-b304-3458873ff9a3 for Nelumbo_nucifera cli.py:240
INFO tmpdir = wgdtmp_c9ae3d52-3a35-466e-999e-9019c453b2a9 for Telopea_speciosissima cli.py:240
INFO tmpdir = wgdtmp_07e310d3-4ae9-467e-a017-90c431c4c37b for Protea_cynaroides cli.py:240
INFO tmpdir = wgdtmp_c03a894f-7178-4ca7-a7bd-6e37bd5a8a87 for Sarracenia_purpurea cli.py:240
INFO 4 threads are used for 262 gene families cli.py:242
Note that adding threads can significantly accelerate the analysis
INFO Only implement protein concatenation dating via mcmctree cli.py:244
2024-06-16 12:59:28 INFO Running mcmctree using Hessian matrix of LG+Gamma for protein model core.py:1048
Traceback (most recent call last):
File "~/.conda/envs/WGD/bin/wgd", line 8, in <module>
sys.exit(cli())
File "~/.conda/envs/WGD/lib/python3.8/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "~y/.conda/envs/WGD/lib/python3.8/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "~/.conda/envs/WGD/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "~/.conda/envs/WGD/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "~/.conda/envs/WGD/lib/python3.8/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "~/.conda/envs/WGD/lib/python3.8/site-packages/cli.py", line 223, in focus
_focus(**kwargs)
File "~/.conda/envs/WGD/lib/python3.8/site-packages/cli.py", line 247, in _focus
Run_MCMCTREE_concprot(Concat_paln,Concat_palnf,tmpdir,outdir,speciestree,datingset,aamodel,slist,nthreads)
File "~y/.conda/envs/WGD/lib/python3.8/site-packages/wgd/core.py", line 1054, in Run_MCMCTREE_concprot
McMctree.run_mcmctree(CI_table,PM_table,wgd_mrca)
File ~/.conda/envs/WGD/lib/python3.8/site-packages/wgd/mcmctree.py", line 251, in run_mcmctree
_run_mcmctree('mcmctree.ctrl')
File "~/.conda/envs/WGD/lib/python3.8/site-packages/wgd/mcmctree.py", line 119, in _run_mcmctree
sp.run(['mcmctree', control_file], stdout=sp.PIPE)
File "~/.conda/envs/WGD/lib/python3.8/subprocess.py", line 489, in run
with Popen(*popenargs, **kwargs) as process:
File "~/.conda/envs/WGD/lib/python3.8/subprocess.py", line 854, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "~/.conda/envs/WGD/lib/python3.8/subprocess.py", line 1702, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'mcmctree'
I can see the 'mcmctee' in the output file structure.
Any idea how to fix this error?
Here is the wgd_dating structure:
13:28:17 $ ls wgd_dating/
Concatenated.paln GF00000029.paln GF00000058.pep GF00000088.paln GF00000117.pep GF00000147.paln GF00000176.pep GF00000206.paln GF00000235.pep
Concatenated.paln.paml GF00000029.pep GF00000059.paln GF00000088.pep GF00000118.paln GF00000147.pep GF00000177.paln GF00000206.pep GF00000236.paln
G2S.Map GF00000030.paln GF00000059.pep GF00000089.paln GF00000118.pep GF00000148.paln GF00000177.pep GF00000207.paln GF00000236.pep
GF00000001.paln GF00000030.pep GF00000060.paln GF00000089.pep GF00000119.paln GF00000148.pep GF00000178.paln GF00000207.pep GF00000237.paln
GF00000001.pep GF00000031.paln GF00000060.pep GF00000090.paln GF00000119.pep GF00000149.paln GF00000178.pep GF00000208.paln GF00000237.pep
GF00000002.paln GF00000031.pep GF00000061.paln GF00000090.pep GF00000120.paln GF00000149.pep GF00000179.paln GF00000208.pep GF00000238.paln
GF00000002.pep GF00000032.paln GF00000061.pep GF00000091.paln GF00000120.pep GF00000150.paln GF00000179.pep GF00000209.paln GF00000238.pep
GF00000003.paln GF00000032.pep GF00000062.paln GF00000091.pep GF00000121.paln GF00000150.pep GF00000180.paln GF00000209.pep GF00000239.paln
GF00000003.pep GF00000033.paln GF00000062.pep GF00000092.paln GF00000121.pep GF00000151.paln GF00000180.pep GF00000210.paln GF00000239.pep
GF00000004.paln GF00000033.pep GF00000063.paln GF00000092.pep GF00000122.paln GF00000151.pep GF00000181.paln GF00000210.pep GF00000240.paln
GF00000004.pep GF00000034.paln GF00000063.pep GF00000093.paln GF00000122.pep GF00000152.paln GF00000181.pep GF00000211.paln GF00000240.pep
GF00000005.paln GF00000034.pep GF00000064.paln GF00000093.pep GF00000123.paln GF00000152.pep GF00000182.paln GF00000211.pep GF00000241.paln
GF00000005.pep GF00000035.paln GF00000064.pep GF00000094.paln GF00000123.pep GF00000153.paln GF00000182.pep GF00000212.paln GF00000241.pep
GF00000006.paln GF00000035.pep GF00000065.paln GF00000094.pep GF00000124.paln GF00000153.pep GF00000183.paln GF00000212.pep GF00000242.paln
GF00000006.pep GF00000036.paln GF00000065.pep GF00000095.paln GF00000124.pep GF00000154.paln GF00000183.pep GF00000213.paln GF00000242.pep
GF00000007.paln GF00000036.pep GF00000066.paln GF00000095.pep GF00000125.paln GF00000154.pep GF00000184.paln GF00000213.pep GF00000243.paln
GF00000007.pep GF00000037.paln GF00000066.pep GF00000096.paln GF00000125.pep GF00000155.paln GF00000184.pep GF00000214.paln GF00000243.pep
GF00000008.paln GF00000037.pep GF00000067.paln GF00000096.pep GF00000126.paln GF00000155.pep GF00000185.paln GF00000214.pep GF00000244.paln
GF00000008.pep GF00000038.paln GF00000067.pep GF00000097.paln GF00000126.pep GF00000156.paln GF00000185.pep GF00000215.paln GF00000244.pep
GF00000009.paln GF00000038.pep GF00000068.paln GF00000097.pep GF00000127.paln GF00000156.pep GF00000186.paln GF00000215.pep GF00000245.paln
GF00000009.pep GF00000039.paln GF00000068.pep GF00000098.paln GF00000127.pep GF00000157.paln GF00000186.pep GF00000216.paln GF00000245.pep
GF00000010.paln GF00000039.pep GF00000069.paln GF00000098.pep GF00000128.paln GF00000157.pep GF00000187.paln GF00000216.pep GF00000246.paln
GF00000010.pep GF00000040.paln GF00000069.pep GF00000099.paln GF00000128.pep GF00000158.paln GF00000187.pep GF00000217.paln GF00000246.pep
GF00000011.paln GF00000040.pep GF00000070.paln GF00000099.pep GF00000129.paln GF00000158.pep GF00000188.paln GF00000217.pep GF00000247.paln
GF00000011.pep GF00000041.paln GF00000070.pep GF00000100.paln GF00000129.pep GF00000159.paln GF00000188.pep GF00000218.paln GF00000247.pep
GF00000012.paln GF00000041.pep GF00000071.paln GF00000100.pep GF00000130.paln GF00000159.pep GF00000189.paln GF00000218.pep GF00000248.paln
GF00000012.pep GF00000042.paln GF00000071.pep GF00000101.paln GF00000130.pep GF00000160.paln GF00000189.pep GF00000219.paln GF00000248.pep
GF00000013.paln GF00000042.pep GF00000072.paln GF00000101.pep GF00000131.paln GF00000160.pep GF00000190.paln GF00000219.pep GF00000249.paln
GF00000013.pep GF00000043.paln GF00000072.pep GF00000102.paln GF00000131.pep GF00000161.paln GF00000190.pep GF00000220.paln GF00000249.pep
GF00000014.paln GF00000043.pep GF00000073.paln GF00000102.pep GF00000132.paln GF00000161.pep GF00000191.paln GF00000220.pep GF00000250.paln
GF00000014.pep GF00000044.paln GF00000073.pep GF00000103.paln GF00000132.pep GF00000162.paln GF00000191.pep GF00000221.paln GF00000250.pep
GF00000015.paln GF00000044.pep GF00000074.paln GF00000103.pep GF00000133.paln GF00000162.pep GF00000192.paln GF00000221.pep GF00000251.paln
GF00000015.pep GF00000045.paln GF00000074.pep GF00000104.paln GF00000133.pep GF00000163.paln GF00000192.pep GF00000222.paln GF00000251.pep
GF00000016.paln GF00000045.pep GF00000075.paln GF00000104.pep GF00000134.paln GF00000163.pep GF00000193.paln GF00000222.pep GF00000252.paln
GF00000016.pep GF00000046.paln GF00000075.pep GF00000105.paln GF00000134.pep GF00000164.paln GF00000193.pep GF00000223.paln GF00000252.pep
GF00000017.paln GF00000046.pep GF00000076.paln GF00000105.pep GF00000135.paln GF00000164.pep GF00000194.paln GF00000223.pep GF00000253.paln
GF00000017.pep GF00000047.paln GF00000076.pep GF00000106.paln GF00000135.pep GF00000165.paln GF00000194.pep GF00000224.paln GF00000253.pep
GF00000018.paln GF00000047.pep GF00000077.paln GF00000106.pep GF00000136.paln GF00000165.pep GF00000195.paln GF00000224.pep GF00000254.paln
GF00000018.pep GF00000048.paln GF00000077.pep GF00000107.paln GF00000136.pep GF00000166.paln GF00000195.pep GF00000225.paln GF00000254.pep
GF00000019.paln GF00000048.pep GF00000078.paln GF00000107.pep GF00000137.paln GF00000166.pep GF00000196.paln GF00000225.pep GF00000255.paln
GF00000019.pep GF00000049.paln GF00000078.pep GF00000108.paln GF00000137.pep GF00000167.paln GF00000196.pep GF00000226.paln GF00000255.pep
GF00000020.paln GF00000049.pep GF00000079.paln GF00000108.pep GF00000138.paln GF00000167.pep GF00000197.paln GF00000226.pep GF00000256.paln
GF00000020.pep GF00000050.paln GF00000079.pep GF00000109.paln GF00000138.pep GF00000168.paln GF00000197.pep GF00000227.paln GF00000256.pep
GF00000021.paln GF00000050.pep GF00000080.paln GF00000109.pep GF00000139.paln GF00000168.pep GF00000198.paln GF00000227.pep GF00000257.paln
GF00000021.pep GF00000051.paln GF00000080.pep GF00000110.paln GF00000139.pep GF00000169.paln GF00000198.pep GF00000228.paln GF00000257.pep
GF00000022.paln GF00000051.pep GF00000081.paln GF00000110.pep GF00000140.paln GF00000169.pep GF00000199.paln GF00000228.pep GF00000258.paln
GF00000022.pep GF00000052.paln GF00000081.pep GF00000111.paln GF00000140.pep GF00000170.paln GF00000199.pep GF00000229.paln GF00000258.pep
GF00000023.paln GF00000052.pep GF00000082.paln GF00000111.pep GF00000141.paln GF00000170.pep GF00000200.paln GF00000229.pep GF00000259.paln
GF00000023.pep GF00000053.paln GF00000082.pep GF00000112.paln GF00000141.pep GF00000171.paln GF00000200.pep GF00000230.paln GF00000259.pep
GF00000024.paln GF00000053.pep GF00000083.paln GF00000112.pep GF00000142.paln GF00000171.pep GF00000201.paln GF00000230.pep GF00000260.paln
GF00000024.pep GF00000054.paln GF00000083.pep GF00000113.paln GF00000142.pep GF00000172.paln GF00000201.pep GF00000231.paln GF00000260.pep
GF00000025.paln GF00000054.pep GF00000084.paln GF00000113.pep GF00000143.paln GF00000172.pep GF00000202.paln GF00000231.pep GF00000261.paln
GF00000025.pep GF00000055.paln GF00000084.pep GF00000114.paln GF00000143.pep GF00000173.paln GF00000202.pep GF00000232.paln GF00000261.pep
GF00000026.paln GF00000055.pep GF00000085.paln GF00000114.pep GF00000144.paln GF00000173.pep GF00000203.paln GF00000232.pep GF00000262.paln
GF00000026.pep GF00000056.paln GF00000085.pep GF00000115.paln GF00000144.pep GF00000174.paln GF00000203.pep GF00000233.paln GF00000262.pep
GF00000027.paln GF00000056.pep GF00000086.paln GF00000115.pep GF00000145.paln GF00000174.pep GF00000204.paln GF00000233.pep mcmctree
GF00000027.pep GF00000057.paln GF00000086.pep GF00000116.paln GF00000145.pep GF00000175.paln GF00000204.pep GF00000234.paln
GF00000028.paln GF00000057.pep GF00000087.paln GF00000116.pep GF00000146.paln GF00000175.pep GF00000205.paln GF00000234.pep
GF00000028.pep GF00000058.paln GF00000087.pep GF00000117.paln GF00000146.pep GF00000176.paln GF00000205.pep GF00000235.paln
Hi, it seems that mcmctree
was not properly called in your run. What would you get if you type in "mcmctree" and enter?
Hey, You're right, it was a PAML installation issue. Once I fixed it, the error went away. I still can't figure out how to modify the dating-tree you provided (below) to work with my species.
17 1
((((Potamogeton_acutifolius,(Spirodela_intermedia,Amorphophallus_konjac)),(Acanthochlamys_bracteata,(Dioscorea_alata,Dioscorea_rotundata))'>0.5600<1.2863')'>0.8360<1.2863',(Acorus_americanus,Acorus_tatarinowii))'>0.8360<1.2863',((((Tetracentron_sinense,Trochodendron_aralioides),(Buxus_austroyunnanensis,Buxus_sinica))'>1.1080<1.2863',(Nelumbo_nucifera,(Telopea_speciosissima,Protea_cynaroides)))'>1.1080<1.2863',(Aquilegia_coerulea_ap1,Aquilegia_coerulea_ap2))'>1.1080<1.2863')'>1.2720<2.4720';
So I used it as is except for replacing Aquilegia_coerulea_ap1,Aquilegia_coerulea_ap2
with MySpecies_ap1,MySpecies_ap2
. I got WGD a reasonable peak, which seems to be consistent with earlier evidences. but the range between the low and high 90% CI is a bit large, which worries me.
"Posterior mean for the age of wgd is 81.0112 million years ago from Concatenated peptide alignment and 95% credibility intervals (CI) is 53.2662-108.8130 million years."
from the anchored KS distribution, the mode is 0.52, and 95% CI is 0.23 - 2.19. from anchored pairs of duplications, the aggregation of duplicates age at KS = 0.72 from non-anchor pairs of duplications, the aggregation of duplicates age at KS = 0.51
When I ploted the WGD age distribution using the following command, the Median, posterior mean, and mode were 81, 80, and 84 mya. the 95% CI is 56.61-103 mya.
python $PATH/postplot.py postdis dates.txt --percentile 90 --title "WGD date" --hpd -o "WGD_date.svg"
What do you think?
There are four resultant subfolders from wgd peak
, among which you can usually choose peak range from AnchorKs_FindPeak
, AnchorKs_GMM
or SegmentGuideKs_GMM
. If the one from AnchorKs_FindPeak
is too wide, you can always try the other two. Besides, a broad 95% HPD CI width reflects the constraint from fossil calibrations. The requirement of the starting tree topology is that your focal species is the only species who has experienced the WGD event to be dated in the tree. If you may find and use fossil calibrations that have narrower bounds, the concern of yours will be relieved too.
Hello,
You're providing the following dating tree:
I understand that I need to replace the "Aquilegia_coreulea" branches with my species. but what about the branch lengths? Do i need to generate a new species tree including all those species plus my species?
What if I used the same species that I used in the orthology analysis?
thanks