heche-psb / wgd

wgd v2: a suite of tools to uncover and date ancient polyploidy and whole-genome duplication
https://wgdv2.readthedocs.io/en/latest/
GNU General Public License v3.0
25 stars 0 forks source link

wgd focus error #43

Closed ilaydagulmez closed 2 weeks ago

ilaydagulmez commented 3 weeks ago

Hi, I have a problem with last step. I have 3 different cds file and when I run this command which I gave below, I got the error like this "ValueError: Sequences must all be the same length"

command:

wgd focus --protcocdating --aamodel lg ./wgd_dmd_ortho/merge_focus_ap.tsv -sp dating_tree.nw -o last_wgd_dating -d mcmctree -ds 'burnin = 2000' -ds 'sampfreq = 1000' -ds 'nsample = 20000' 8_ragtag_polished3_CDS.fasta 1_ragtag_polished3_CDS.fasta 9_ragtag_polished3_CDS.fasta

Any suggestion? Thanks!

heche-psb commented 2 weeks ago

Hi, the species names in your dating_tree.nw file should be in accordance with the cds file names. Is it the case in your run?

ilaydagulmez commented 2 weeks ago

Hi, do I prepare this file manually? It did not appear as output in the previous steps.

heche-psb commented 2 weeks ago

Hi, I think you have all the seq files already. For instance, assuming that your dating_tree.nw looks like

5 1
(((sp1,sp2)'>0.5600<1.2863',sp3)'>0.8360<1.2863',(sp4_ap1,sp4_ap2))'>0.8360<1.2863';

you accordingly need to provide four seq files named exactly as sp1,sp2,sp3 and sp4.

ilaydagulmez commented 2 weeks ago

Hi, I have three species now for this step and I wrote like this:

3 1
((8_ragtag_polished3_CDS.fasta,1_ragtag_polished3_CDS.fasta)'>0.5600<1.2863',9_ragtag_polished3_CDS)'>0.8360<1.2863';

But I'm not sure about this because got the same error:

  File "/truba/home/tbag77/miniconda3/envs/py38/lib/python3.8/site-packages/Bio/Align/__init__.py", line 593, in _append
    raise ValueError("Sequences must all be the same length")

Thanks

heche-psb commented 2 weeks ago

Hi, if you're doing WGD dating, you need to manually add postfix "_ap1" and "_ap2" to the focal species as the dating_tree.nw I show above.

ilaydagulmez commented 2 weeks ago

Hi, sorry for this. I added _ap1 and _ap2 like this:

3 1
(((8_ragtag_polished3_CDS.fasta,1_ragtag_polished3_CDS.fasta)'>0.5600<1.2863', (9_ragtag_polished3_CDS_ap1, 9_ragtag_polished3_CDS_ap2))'>0.8360<1.2863';

But got the same error again.

heche-psb commented 2 weeks ago

Hi, could you also show the column names of your merge_focus_ap.tsv file? They should also accord with the seq file names.

ilaydagulmez commented 2 weeks ago

Hi, here is the file and I think I could change my species places in the dating_tree.nw file.

Ekran Resmi 2024-09-18 11 08 03
heche-psb commented 2 weeks ago

Hi, based on your merge_focus_ap.tsv file, I think you should set 8_ragtag_polished3_CDS.fasta to be 8_ragtag_polished3_CDS.fasta_ap1 and 8_ragtag_polished3_CDS.fasta_ap2.

ilaydagulmez commented 2 weeks ago

Hi, even I changed my commands like this (((1_ragtag_polished3_CDS.fasta,9_ragtag_polished3_CDS.fasta)'>0.5600<1.2863', (8_ragtag_polished3_CDS_ap1, 8_ragtag_polished3_CDS_ap2))'>0.8360<1.2863';

I got the same error again.

heche-psb commented 2 weeks ago

Hi, could you share your full log of error?

ilaydagulmez commented 2 weeks ago

Hi, thanks for your time. Here is the log:

slurm-1667557.txt

heche-psb commented 2 weeks ago

Hi, it seems that you have issues with your gene ids that different species could have the same gene id. Could you change the gene ids for each species to distinguish them? For instance, do the following to your seq files

sed -i "s/>/>ragtag8_/g" 8_ragtag_polished3_CDS.fasta
sed -i "s/>/>ragtag9_/g" 9_ragtag_polished3_CDS.fasta
sed -i "s/>/>ragtag1_/g" 1_ragtag_polished3_CDS.fasta

and in a python session do the following to merge_focus_ap.tsv

import pandas as pd
df = pd.read_csv("merge_focus_ap.tsv", header=0, index_col=0, sep='\t')
for column in df.columns:
      df[column] = df[column].apply(lambda x:"ragtag"+column[0]+"_"+x)
df.to_csv("new_merge_focus_ap.tsv", header=True, index=True, sep='\t')
ilaydagulmez commented 2 weeks ago

Hi, thanks for your advice it works for the error but now mcmc.txt or dates.txt file wouldn't created even mcmc file was created.

Maybe this fail could explain:

   Figtree = Phylo.read('FigTree.tre','nexus')
  File "/truba/home/tbag77/miniconda3/envs/py38/lib/python3.8/site-packages/Bio/Phylo/_io.py", line 62, in read
    tree = next(tree_gen)
  File "/truba/home/tbag77/miniconda3/envs/py38/lib/python3.8/site-packages/Bio/Phylo/_io.py", line 49, in parse
    with File.as_handle(file, "r") as fp:
  File "/truba/home/tbag77/miniconda3/envs/py38/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/truba/home/tbag77/miniconda3/envs/py38/lib/python3.8/site-packages/Bio/File.py", line 120, in as_handle
    with open(handleish, mode, **kwargs) as fp:
FileNotFoundError: [Errno 2] No such file or directory: 'FigTree.tre'
heche-psb commented 2 weeks ago

Hi, it shows that the mcmctree was not properly running. Could you go into that last_wgd_dating/mcmctree/Concatenated/pep folder and check the file mcmctree.out and try to run mcmctree mcmctree.ctrl directly to inspect what you get? It could be that mcmctree can't deal with 3-species tree properly and it's suggested using more species for instance 17-19 species and sufficient fossil calibrations to improve the accurancy and precision of the molecular dating analysis.

ilaydagulmez commented 2 weeks ago

Hi, thanks for the advice. I shared out file, maybe I should add some other species to list for this step like your opinion about the numbers of species.

Ekran Resmi 2024-09-18 12 45 27
heche-psb commented 2 weeks ago

Hi, could you do cat mcmctree.out and share the running log of mcmctree mcmctree.ctrl? The determinant of the final WGD date is the species tree topology and fossil calibrations, which deserve dedicated investigation.

ilaydagulmez commented 2 weeks ago

Hi, sorry but I don't understand the second step. You mean, should I running mcmctree independent with the mcmctree.ctrl file which I obtained from focus? If is it right, here is the log:

Ekran Resmi 2024-09-18 13 36 44
heche-psb commented 2 weeks ago

It's just for the purpose of debugging. Normally it's not needed but you apparently ran into an error caused by mcmctree. So it's better to run it manually to inspect the issue. I guess you might need to resort to the manual of mcmctree to pinpoint the issue, which I think has something to do with the 3-species tree.

ilaydagulmez commented 2 weeks ago

Thank you so much for your help, all issues, all advices. I will write an e-mail to you soon, and now I close the issue. Thanks again! Best