Closed ilaydagulmez closed 2 weeks ago
Hi, the species names in your dating_tree.nw
file should be in accordance with the cds file names. Is it the case in your run?
Hi, do I prepare this file manually? It did not appear as output in the previous steps.
Hi, I think you have all the seq files already. For instance, assuming that your dating_tree.nw
looks like
5 1
(((sp1,sp2)'>0.5600<1.2863',sp3)'>0.8360<1.2863',(sp4_ap1,sp4_ap2))'>0.8360<1.2863';
you accordingly need to provide four seq files named exactly as sp1
,sp2
,sp3
and sp4
.
Hi, I have three species now for this step and I wrote like this:
3 1
((8_ragtag_polished3_CDS.fasta,1_ragtag_polished3_CDS.fasta)'>0.5600<1.2863',9_ragtag_polished3_CDS)'>0.8360<1.2863';
But I'm not sure about this because got the same error:
File "/truba/home/tbag77/miniconda3/envs/py38/lib/python3.8/site-packages/Bio/Align/__init__.py", line 593, in _append
raise ValueError("Sequences must all be the same length")
Thanks
Hi, if you're doing WGD dating, you need to manually add postfix "_ap1" and "_ap2" to the focal species as the dating_tree.nw
I show above.
Hi, sorry for this. I added _ap1 and _ap2 like this:
3 1
(((8_ragtag_polished3_CDS.fasta,1_ragtag_polished3_CDS.fasta)'>0.5600<1.2863', (9_ragtag_polished3_CDS_ap1, 9_ragtag_polished3_CDS_ap2))'>0.8360<1.2863';
But got the same error again.
Hi, could you also show the column names of your merge_focus_ap.tsv
file? They should also accord with the seq file names.
Hi, here is the file and I think I could change my species places in the dating_tree.nw file.
Hi, based on your merge_focus_ap.tsv
file, I think you should set 8_ragtag_polished3_CDS.fasta
to be 8_ragtag_polished3_CDS.fasta_ap1
and 8_ragtag_polished3_CDS.fasta_ap2
.
Hi, even I changed my commands like this (((1_ragtag_polished3_CDS.fasta,9_ragtag_polished3_CDS.fasta)'>0.5600<1.2863', (8_ragtag_polished3_CDS_ap1, 8_ragtag_polished3_CDS_ap2))'>0.8360<1.2863';
I got the same error again.
Hi, could you share your full log of error?
Hi, thanks for your time. Here is the log:
Hi, it seems that you have issues with your gene ids that different species could have the same gene id. Could you change the gene ids for each species to distinguish them? For instance, do the following to your seq files
sed -i "s/>/>ragtag8_/g" 8_ragtag_polished3_CDS.fasta
sed -i "s/>/>ragtag9_/g" 9_ragtag_polished3_CDS.fasta
sed -i "s/>/>ragtag1_/g" 1_ragtag_polished3_CDS.fasta
and in a python session do the following to merge_focus_ap.tsv
import pandas as pd
df = pd.read_csv("merge_focus_ap.tsv", header=0, index_col=0, sep='\t')
for column in df.columns:
df[column] = df[column].apply(lambda x:"ragtag"+column[0]+"_"+x)
df.to_csv("new_merge_focus_ap.tsv", header=True, index=True, sep='\t')
Hi, thanks for your advice it works for the error but now mcmc.txt or dates.txt file wouldn't created even mcmc file was created.
Maybe this fail could explain:
Figtree = Phylo.read('FigTree.tre','nexus')
File "/truba/home/tbag77/miniconda3/envs/py38/lib/python3.8/site-packages/Bio/Phylo/_io.py", line 62, in read
tree = next(tree_gen)
File "/truba/home/tbag77/miniconda3/envs/py38/lib/python3.8/site-packages/Bio/Phylo/_io.py", line 49, in parse
with File.as_handle(file, "r") as fp:
File "/truba/home/tbag77/miniconda3/envs/py38/lib/python3.8/contextlib.py", line 113, in __enter__
return next(self.gen)
File "/truba/home/tbag77/miniconda3/envs/py38/lib/python3.8/site-packages/Bio/File.py", line 120, in as_handle
with open(handleish, mode, **kwargs) as fp:
FileNotFoundError: [Errno 2] No such file or directory: 'FigTree.tre'
Hi, it shows that the mcmctree
was not properly running. Could you go into that last_wgd_dating/mcmctree/Concatenated/pep
folder and check the file mcmctree.out
and try to run mcmctree mcmctree.ctrl
directly to inspect what you get? It could be that mcmctree
can't deal with 3-species tree properly and it's suggested using more species for instance 17-19 species and sufficient fossil calibrations to improve the accurancy and precision of the molecular dating analysis.
Hi, thanks for the advice. I shared out file, maybe I should add some other species to list for this step like your opinion about the numbers of species.
Hi, could you do cat mcmctree.out
and share the running log of mcmctree mcmctree.ctrl
? The determinant of the final WGD date is the species tree topology and fossil calibrations, which deserve dedicated investigation.
Hi, sorry but I don't understand the second step. You mean, should I running mcmctree
independent with the mcmctree.ctrl
file which I obtained from focus
? If is it right, here is the log:
It's just for the purpose of debugging. Normally it's not needed but you apparently ran into an error caused by mcmctree
. So it's better to run it manually to inspect the issue. I guess you might need to resort to the manual of mcmctree
to pinpoint the issue, which I think has something to do with the 3-species tree.
Thank you so much for your help, all issues, all advices. I will write an e-mail to you soon, and now I close the issue. Thanks again! Best
Hi, I have a problem with last step. I have 3 different cds file and when I run this command which I gave below, I got the error like this "ValueError: Sequences must all be the same length"
command:
wgd focus --protcocdating --aamodel lg ./wgd_dmd_ortho/merge_focus_ap.tsv -sp dating_tree.nw -o last_wgd_dating -d mcmctree -ds 'burnin = 2000' -ds 'sampfreq = 1000' -ds 'nsample = 20000' 8_ragtag_polished3_CDS.fasta 1_ragtag_polished3_CDS.fasta 9_ragtag_polished3_CDS.fasta
Any suggestion? Thanks!