heche-psb / wgd

wgd v2: a suite of tools to uncover and date ancient polyploidy and whole-genome duplication
https://wgdv2.readthedocs.io/en/latest/
GNU General Public License v3.0
21 stars 0 forks source link

Issue with the spair flag #28

Closed niWdooG closed 2 months ago

niWdooG commented 2 months ago

Hi,

I'm trying to run wgd ksd with the spair flag:

wgd ksd global_MRBH.tsv *.fa --extraparanomeks SalCuc.fa.tsv.ks.tsv -sp speciestree.nw --reweight -o wgd_globalmrbh_ks_new --spair "SalCuc.fa;SalCuc.fa" --spair "SalCuc.fa;AzoFil.fa" --spair "SalCuc.fa;CerRic.fa" --spair "SalCuc.fa;AdiCap.fa" --spair "SalCuc.fa;AlsSpi.fa" --spair "SalCuc.fa;CibBar.fa" --spair "SalCuc.fa;DicPed.fa" --plotkde

and I get:

Error: Invalid value for '[SEQUENCES]...': Path ' ' does not exist.

The species tree file matches the --spair option. What could be the issue?

The dataset to reproduce

Best, Evgenii

heche-psb commented 2 months ago

Hi, first a quick solution for the rate correction analysis using the command below.

(ENV_wgd)$ wgd ksd global_MRBH.tsv AzoFil.fa SalCuc.fa DicPed.fa CibBar.fa AdiCap.fa AlsSpi.fa CerRic.fa -fa SalCuc.fa --plotkde -o te
st_ksd_viz -sp speciestree.nw
09:45:54 INFO     This is wgd v2.0.30                                                                cli.py:32
09:49:14 INFO     tmpdir = wgdtmp_c2980d82-a6ee-4cde-8c2e-29c245fe2666                              cli.py:505
09:50:19 INFO     Analysing family GF00000002                                                     core.py:3057
09:50:19 INFO     Analysing family GF00000004                                                     core.py:3057
09:50:19 INFO     Analysing family GF00000001                                                     core.py:3057
09:50:19 INFO     Analysing family GF00000003                                                     core.py:3057
09:50:54 INFO     Analysing family GF00000005                                                     core.py:3057
09:51:08 INFO     Analysing family GF00000006                                                     core.py:3057
09:51:11 INFO     Analysing family GF00000007                                                     core.py:3057
heche-psb commented 2 months ago

I couldn't reproduce the same error as yours.

(ENV_wgd) $ wgd ksd global_MRBH.tsv AzoFil.fa SalCuc.fa DicPed.fa CibBar.fa AdiCap.fa AlsSpi.fa CerRic.fa --spair "SalCuc.fa;SalCuc.fa" --spair "SalCuc.fa;AzoFil.fa" --spair "SalCuc.fa;CerRic.fa" --spair "SalCuc.fa;AdiCap.fa" --spair "SalCuc.fa;AlsSpi.fa" --spair "SalCuc.fa;CibBar.fa" --spair "SalCuc.fa;DicPed.fa" --plotkde -o test_ksd_viz2
09:55:10 INFO     This is wgd v2.0.30                                                                               cli.py:32
09:57:41 INFO     tmpdir = wgdtmp_432d76d4-72fa-4f02-a3b0-278cec924971                                             cli.py:505
09:58:23 INFO     Analysing family GF00000001                                                                    core.py:3057
09:58:23 INFO     Analysing family GF00000003                                                                    core.py:3057
09:58:23 INFO     Analysing family GF00000004                                                                    core.py:3057
09:58:23 INFO     Analysing family GF00000002                                                                    core.py:3057
09:58:51 INFO     Analysing family GF00000005                                                                    core.py:3057
niWdooG commented 2 months ago

Does it work with the extraparanomeks flag?

heche-psb commented 2 months ago

Yes, it also works with the -epk flag.

(ENV_wgd) $ wgd ksd global_MRBH.tsv AzoFil.fa SalCuc.fa DicPed.fa CibBar.fa AdiCap.fa AlsSpi.fa CerRic.fa --spair "SalCuc.fa;SalCuc.fa" --spair "SalCuc.fa;AzoFil.fa" --spair "SalCuc.fa;CerRic.fa" --spair "SalCuc.fa;AdiCap.fa" --spair "SalCuc.fa;AlsSpi.fa" --spair "SalCuc.fa;CibBar.fa" --spair "SalCuc.fa;DicPed.fa" --plotkde -o test_ksd_viz2 -epk SalCuc.fa.tsv.ks.tsv
10:14:04 INFO     This is wgd v2.0.30                                                                               cli.py:32
10:17:06 INFO     tmpdir = wgdtmp_4546d35a-ce9a-4fea-b285-1e5649ddd3f9                                             cli.py:505
10:17:27 INFO     Analysing family GF00000003                                                                    core.py:3057
10:17:27 INFO     Analysing family GF00000004                                                                    core.py:3057
10:17:27 INFO     Analysing family GF00000001                                                                    core.py:3057
10:17:27 INFO     Analysing family GF00000002                                                                    core.py:3057
10:17:47 INFO     Analysing family GF00000005                                                                    core.py:3057
10:18:01 INFO     Analysing family GF00000006                                                                    core.py:3057
10:18:04 INFO     Analysing family GF00000007                                                                    core.py:3057
10:18:16 INFO     Analysing family GF00000008                                                                    core.py:3057
10:18:21 INFO     Analysing family GF00000009                                                                    core.py:3057
10:18:21 INFO     Analysing family GF00000010                                                                    core.py:3057
10:18:31 INFO     Analysing family GF00000011                                                                    core.py:3057
niWdooG commented 2 months ago

Hi, first a quick solution for the rate correction analysis using the command below.

(ENV_wgd)$ wgd ksd global_MRBH.tsv AzoFil.fa SalCuc.fa DicPed.fa CibBar.fa AdiCap.fa AlsSpi.fa CerRic.fa -fa SalCuc.fa --plotkde -o te
st_ksd_viz -sp speciestree.nw
09:45:54 INFO     This is wgd v2.0.30                                                                cli.py:32
09:49:14 INFO     tmpdir = wgdtmp_c2980d82-a6ee-4cde-8c2e-29c245fe2666                              cli.py:505
09:50:19 INFO     Analysing family GF00000002                                                     core.py:3057
09:50:19 INFO     Analysing family GF00000004                                                     core.py:3057
09:50:19 INFO     Analysing family GF00000001                                                     core.py:3057
09:50:19 INFO     Analysing family GF00000003                                                     core.py:3057
09:50:54 INFO     Analysing family GF00000005                                                     core.py:3057
09:51:08 INFO     Analysing family GF00000006                                                     core.py:3057
09:51:11 INFO     Analysing family GF00000007                                                     core.py:3057

It works but also gives the following error:

IndexError: index 2 is out of bounds for axis 0 with size 2

log.txt

heche-psb commented 2 months ago

Hi, sorry for the confusion. Did you provide the paranome Ks file to the -epk option? Since I saw this log info:

08:10:23 INFO     Plotting the final mixed Ks distribution    ratecorrect.py:934
         ERROR    No paralogous Ks data was found             ratecorrect.py:381
niWdooG commented 2 months ago

Yes, it also works with the -epk flag.

(ENV_wgd) $ wgd ksd global_MRBH.tsv AzoFil.fa SalCuc.fa DicPed.fa CibBar.fa AdiCap.fa AlsSpi.fa CerRic.fa --spair "SalCuc.fa;SalCuc.fa" --spair "SalCuc.fa;AzoFil.fa" --spair "SalCuc.fa;CerRic.fa" --spair "SalCuc.fa;AdiCap.fa" --spair "SalCuc.fa;AlsSpi.fa" --spair "SalCuc.fa;CibBar.fa" --spair "SalCuc.fa;DicPed.fa" --plotkde -o test_ksd_viz2 -epk SalCuc.fa.tsv.ks.tsv
10:14:04 INFO     This is wgd v2.0.30                                                                               cli.py:32
10:17:06 INFO     tmpdir = wgdtmp_4546d35a-ce9a-4fea-b285-1e5649ddd3f9                                             cli.py:505
10:17:27 INFO     Analysing family GF00000003                                                                    core.py:3057
10:17:27 INFO     Analysing family GF00000004                                                                    core.py:3057
10:17:27 INFO     Analysing family GF00000001                                                                    core.py:3057
10:17:27 INFO     Analysing family GF00000002                                                                    core.py:3057
10:17:47 INFO     Analysing family GF00000005                                                                    core.py:3057
10:18:01 INFO     Analysing family GF00000006                                                                    core.py:3057
10:18:04 INFO     Analysing family GF00000007                                                                    core.py:3057
10:18:16 INFO     Analysing family GF00000008                                                                    core.py:3057
10:18:21 INFO     Analysing family GF00000009                                                                    core.py:3057
10:18:21 INFO     Analysing family GF00000010                                                                    core.py:3057
10:18:31 INFO     Analysing family GF00000011                                                                    core.py:3057

Not sure what was the issue, now I can run it. Nonetheless, it gives:

TypeError: '<' not supported between instances of 'NoneType' and 'str'

log.txt

heche-psb commented 2 months ago

Hi, the issue is at your input newick tree. The original tree is ((((CibBar.fa,AlsSpi.fa),(AdiCap.fa,CerRic.fa)),((AzoFil.fa,SalCuc.fa))),DicPed.fa); which has unnecessary brackets that make Phylo module function unexpectedly. After changing your tree into this one ((((CibBar.fa,AlsSpi.fa),(AdiCap.fa,CerRic.fa)),(AzoFil.fa,SalCuc.fa)),DicPed.fa);, you should be able to run the command below without issues:

$wgd viz -d global_MRBH.tsv.ks.tsv --plotkde -o test_viz -epk SalCuc.fa.tsv.ks.tsv -fa SalCuc.fa -sp speciestree.nw