heche-psb / wgd

wgd v2: a suite of tools to uncover and date ancient polyploidy and whole-genome duplication
https://wgdv2.readthedocs.io/en/latest/
GNU General Public License v3.0
21 stars 0 forks source link

Error running wgd #6

Closed Bio1nform closed 9 months ago

Bio1nform commented 11 months ago

Hi, This is great tool, i have used version 1. Now working with version2. I managed to install with conda, however i am getting following error

wgd -h Usage: wgd [OPTIONS] COMMAND [ARGS]... wgd v2 - Copyright (C) 2023-2024 Hengchi Chen Contact: heche@psb.vib-ugent.be Options: -v, --verbosity [info|debug] Verbosity level, default = info. -h, --help Show this message and exit. Commands: dmd All-vs-all diamond blastp + MCL clustering. focus Multiply species RBH or c-score defined orthologous family's gene... ksd Paranome and one-to-one ortholog Ks distribution inference... mix Mixture modeling of Ks distributions. peak Infer peak and CI of Ks distribution. syn Co-linearity and anchor inference using I-ADHoRe. viz Visualization of Ks distribution or synteny

wgd dmd 09:04:59 INFO This is wgd v1.2 cli.py:32 Traceback (most recent call last): File "/home/.conda/envs/WGD/bin/wgd", line 10, in sys.exit(cli()) File "/home/.local/lib/python3.6/site-packages/click/core.py", line 829, in call return self.main(args, kwargs) File "/home/.local/lib/python3.6/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/home/.local/lib/python3.6/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/.local/lib/python3.6/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/.local/lib/python3.6/site-packages/click/core.py", line 610, in invoke return callback(args, kwargs) File "/home/.conda/envs/WGD/lib/python3.6/site-packages/cli.py", line 113, in dmd _dmd(kwargs) File "/home/.conda/envs/WGD/lib/python3.6/site-packages/cli.py", line 116, in _dmd from wgd.core import SequenceData, read_MultiRBH_gene_families,mrbh,ortho_infer,genes2fams,endt,segmentsaps,bsog ModuleNotFoundError: No module named 'wgd.core'

wgd viz 09:05:19 INFO This is wgd v1.2 cli.py:32 Traceback (most recent call last): File "/home/.conda/envs/WGD/bin/wgd", line 10, in sys.exit(cli()) File "/home/.local/lib/python3.6/site-packages/click/core.py", line 829, in call return self.main(args, kwargs) File "/home/.local/lib/python3.6/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/home/.local/lib/python3.6/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/.local/lib/python3.6/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/.local/lib/python3.6/site-packages/click/core.py", line 610, in invoke return callback(args, kwargs) File "/home/.conda/envs/WGD/lib/python3.6/site-packages/cli.py", line 533, in viz _viz(kwargs) File "/home/.conda/envs/WGD/lib/python3.6/site-packages/cli.py", line 536, in _viz from wgd.viz import elmm_plot, apply_filters, multi_sp_plot, default_plot,all_dotplots,filter_by_minlength,dotplotunitgene,dotplotingene,filter_mingenumber ImportError: cannot import name 'elmm_plot'

Any help would be great. Thanks

Bio1nform commented 9 months ago

I installed the V==2.0.23. in Conda.

wgd ksd wgd_dmd/Aquilegia_coerulea.tsv Aquilegia_coerulea -o wgd_ksd

Still some error.

19:28:57 INFO This is wgd v2.0.23 cli.py:32 19:29:08 INFO tmpdir = cli.py:483 wgdtmp_d51006c1-8f52-4211-9f39-423941314e0c
19:29:13 INFO Analysing family GF00000001 core.py:2873 19:29:13 INFO Analysing family GF00000002 core.py:2873 19:29:13 INFO Analysing family GF00000003 core.py:2873 19:29:13 INFO Analysing family GF00000004 core.py:2873 19:29:27 WARNING Stripped alignment length == 0 for GF00000004 codeml.py:225 INFO Analysing family GF00000005 core.py:2873 19:29:36 WARNING Stripped alignment length == 0 for GF00000002 codeml.py:225 INFO Analysing family GF00000006 core.py:2873 19:29:39 WARNING No codeml result for GF00000003 due to no codeml.py:234 resolved nucleotides

19:55:59 WARNING No codeml result for GF00006547 due to no codeml.py:234 resolved nucleotides
19:56:16 INFO Saving to wgd_ksd/Aquilegia_coerulea.tsv.ks.tsv cli.py:493 19:56:18 INFO Making plots cli.py:495 INFO No valid Ks values for plotting cli.py:497

heche-psb commented 9 months ago

I installed the V==2.0.23. in PYPI.

wgd ksd wgd_dmd/Aquilegia_coerulea.tsv Aquilegia_coerulea -o wgd_ksd

15:20:53 INFO This is wgd v2.0.23 cli.py:32 15:21:10 INFO tmpdir = cli.py:483 wgdtmp_adab2f7f-fb6b-407e-8083-5643b9b4a9fc 15:21:14 INFO Analysing family GF00000001 core.py:2873 15:21:14 INFO Analysing family GF00000002 core.py:2873 15:21:14 INFO Analysing family GF00000003 core.py:2873 15:21:14 INFO Analysing family GF00000004 core.py:2873 15:21:15 INFO Analysing family GF00000005 core.py:2873

Now i get the following error. error.txt

This error shows something wrong with the alignment of GF00000001. Could you find the tmp dir for this family and share me with the GF00000001.cdsaln GF00000001.codeml GF00000001.ctrl pro.aln files.

heche-psb commented 9 months ago

I installed the V==2.0.23. in Conda.

wgd ksd wgd_dmd/Aquilegia_coerulea.tsv Aquilegia_coerulea -o wgd_ksd

Still some error.

19:28:57 INFO This is wgd v2.0.23 cli.py:32 19:29:08 INFO tmpdir = cli.py:483 wgdtmp_d51006c1-8f52-4211-9f39-423941314e0c 19:29:13 INFO Analysing family GF00000001 core.py:2873 19:29:13 INFO Analysing family GF00000002 core.py:2873 19:29:13 INFO Analysing family GF00000003 core.py:2873 19:29:13 INFO Analysing family GF00000004 core.py:2873 19:29:27 WARNING Stripped alignment length == 0 for GF00000004 codeml.py:225 INFO Analysing family GF00000005 core.py:2873 19:29:36 WARNING Stripped alignment length == 0 for GF00000002 codeml.py:225 INFO Analysing family GF00000006 core.py:2873 19:29:39 WARNING No codeml result for GF00000003 due to no codeml.py:234 resolved nucleotides

19:55:59 WARNING No codeml result for GF00006547 due to no codeml.py:234 resolved nucleotides 19:56:16 INFO Saving to wgd_ksd/Aquilegia_coerulea.tsv.ks.tsv cli.py:493 19:56:18 INFO Making plots cli.py:495 INFO No valid Ks values for plotting cli.py:497

Conda wouldn't install paml v4.9j automately, some other versions instead. Could you double check the paml version in your conda environment for wgd?

Bio1nform commented 9 months ago

I installed the V==2.0.23. in PYPI. wgd ksd wgd_dmd/Aquilegia_coerulea.tsv Aquilegia_coerulea -o wgd_ksd 15:20:53 INFO This is wgd v2.0.23 cli.py:32 15:21:10 INFO tmpdir = cli.py:483 wgdtmp_adab2f7f-fb6b-407e-8083-5643b9b4a9fc 15:21:14 INFO Analysing family GF00000001 core.py:2873 15:21:14 INFO Analysing family GF00000002 core.py:2873 15:21:14 INFO Analysing family GF00000003 core.py:2873 15:21:14 INFO Analysing family GF00000004 core.py:2873 15:21:15 INFO Analysing family GF00000005 core.py:2873 Now i get the following error. error.txt

This error shows something wrong with the alignment of GF00000001. Could you find the tmp dir for this family and share me with the GF00000001.cdsaln GF00000001.codeml GF00000001.ctrl pro.aln files.

These are the only files that are present in GF00000001.

pro.aln.txt pro.fasta.txt

heche-psb commented 9 months ago

Manually run MAFFT is no problem on your pro.fasta.txt. It seems something went wrong during the MAFFT analysis. Is MAFFT working properly in your environment on huge family like GF00000001? One suspect is that you didn't give enough cpu to the job.

Bio1nform commented 9 months ago

I installed the V==2.0.23. in Conda. wgd ksd wgd_dmd/Aquilegia_coerulea.tsv Aquilegia_coerulea -o wgd_ksd Still some error. 19:28:57 INFO This is wgd v2.0.23 cli.py:32 19:29:08 INFO tmpdir = cli.py:483 wgdtmp_d51006c1-8f52-4211-9f39-423941314e0c 19:29:13 INFO Analysing family GF00000001 core.py:2873 19:29:13 INFO Analysing family GF00000002 core.py:2873 19:29:13 INFO Analysing family GF00000003 core.py:2873 19:29:13 INFO Analysing family GF00000004 core.py:2873 19:29:27 WARNING Stripped alignment length == 0 for GF00000004 codeml.py:225 INFO Analysing family GF00000005 core.py:2873 19:29:36 WARNING Stripped alignment length == 0 for GF00000002 codeml.py:225 INFO Analysing family GF00000006 core.py:2873 19:29:39 WARNING No codeml result for GF00000003 due to no codeml.py:234 resolved nucleotides 19:55:59 WARNING No codeml result for GF00006547 due to no codeml.py:234 resolved nucleotides 19:56:16 INFO Saving to wgd_ksd/Aquilegia_coerulea.tsv.ks.tsv cli.py:493 19:56:18 INFO Making plots cli.py:495 INFO No valid Ks values for plotting cli.py:497

Conda wouldn't install paml v4.9j automately, some other versions instead. Could you double check the paml version in your conda environment for wgd?

I export the path of paml v4.9j.

export PATH=$PATH:/home/software/GENOMETOOLS/PAML/paml4.9j/bin

It worked with the earlier version. GF00000001 for conda. GF00000001_ks.txt.csv pro.aln.txt pro.fasta.txt

heche-psb commented 9 months ago

I opened a new virtual environment and reinstalled v2.0.23 and wgd ksd runs fine. I can't reproduce your error. Not sure if other users had the same problem only with v2.0.23.

Bio1nform commented 9 months ago

Hi, Both the PYPI and the conda version works fine.

When i run: wgd syn -f mRNA -a Name wgd_dmd/Aquilegia_coerulea.tsv Aquilegia_coerulea.gff3 -ks wgd_ksd/Aquilegia_coerulea.tsv.ks.tsv -o wgd_sync

The output figure seems bit different.

Aquilegia_coerulea-vs-Aquilegia_coerulea_Ks dot_unit_gene Aquilegia_coerulea-vs-Aquilegia_coerulea dot Aquilegia_coerulea-vs-Aquilegia_coerulea dot_unit_gene Aquilegia_coerulea-vs-Aquilegia_coerulea_Ks dot

heche-psb commented 9 months ago

Two types of dotplot were inferred, one is in the unit of gene (Number of genes), and one is in the unit of base (Number of bases). The file name should contain this piece of information.

Bio1nform commented 9 months ago

Two types of dotplot were inferred, one is in the unit of gene (Number of genes), and one is in the unit of base (Number of bases). The file name should contain this piece of information.

I am not getting this figure. image

Bio1nform commented 9 months ago

wgd viz -d wgd_globalmrbh_ks/global_MRBH.tsv.ks.tsv --extraparanomeks wgd_ksd/Aquilegia_coerulea.tsv.ks.tsv -sp speciestree.nw --reweight -ap wgd_sync/iadhore-out/anchorpoints.txt -o wgd_viz_mixed_Ks_elmm --spair "Aquilegia_coerulea;Protea_cynaroides" --spair "Aquilegia_coerulea;Vitis_vinifera" --spair "Aquilegia_coerulea;Acorus_americanus" --spair "Aquilegia_coerulea;Aquilegia_coerulea" --gsmap wgd_globalmrbh_ks/gene_species.map --plotkde --plotelmm

From website. image

My output the peaks are smaller. image

heche-psb commented 9 months ago

Two types of dotplot were inferred, one is in the unit of gene (Number of genes), and one is in the unit of base (Number of bases). The file name should contain this piece of information.

I am not getting this figure. image

It's simple dotplot in oxford grid. The gray dots are homologous gene pairs while red dots are anchor pairs. The transparency of dots can be manually set.

heche-psb commented 9 months ago

wgd viz -d wgd_globalmrbh_ks/global_MRBH.tsv.ks.tsv --extraparanomeks wgd_ksd/Aquilegia_coerulea.tsv.ks.tsv -sp speciestree.nw --reweight -ap wgd_sync/iadhore-out/anchorpoints.txt -o wgd_viz_mixed_Ks_elmm --spair "Aquilegia_coerulea;Protea_cynaroides" --spair "Aquilegia_coerulea;Vitis_vinifera" --spair "Aquilegia_coerulea;Acorus_americanus" --spair "Aquilegia_coerulea;Aquilegia_coerulea" --gsmap wgd_globalmrbh_ks/gene_species.map --plotkde --plotelmm

From website. image

My output the peaks are smaller. image

Both node-averaged and node-weighted plots will be produced. Could you show both?

Bio1nform commented 9 months ago

wgd viz -d wgd_globalmrbh_ks/global_MRBH.tsv.ks.tsv --extraparanomeks wgd_ksd/Aquilegia_coerulea.tsv.ks.tsv -sp speciestree.nw --reweight -ap wgd_sync/iadhore-out/anchorpoints.txt -o wgd_viz_mixed_Ks_elmm --spair "Aquilegia_coerulea;Protea_cynaroides" --spair "Aquilegia_coerulea;Vitis_vinifera" --spair "Aquilegia_coerulea;Acorus_americanus" --spair "Aquilegia_coerulea;Aquilegia_coerulea" --gsmap wgd_globalmrbh_ks/gene_species.map --plotkde --plotelmm From website. image My output the peaks are smaller. image

Both node-averaged and node-weighted plots will be produced. Could you show both?

Aquilegia_coerulea_Corrected.ksd.weighted.svg

image

image

image

heche-psb commented 9 months ago

I think it might be linked with the data we used. What is your source of the Aquilegia coerulea cds file in use?

Bio1nform commented 9 months ago

I used CDS from phytozome (https://phytozome-next.jgi.doe.gov/info/Acoerulea_v3_1)

The NCBI version of Aquilegia coerulea CDS has several genes with duplicate names. wgd cannot handle duplicated names.

Aqcoe0131s0003.1 ATGTATATTAAATATGTCACAACCAAAAAAAACTATTGTACTGTTACATATATGCAGGGGGGTACATACAGTATACAAGG ACGAATCCAGGGGGTGCACGGTGCAACCGCACCCCCAAAATTTGAAATTTTCATTATTTTCCCTATGTTTTTTTGTACAT ATATCATTATTTCCCTATGCTTTTTGCACGTATATAAAAAATTTAGCTTAAATATGTAG Aqcoe0131s0002.1 ATGGTAGATATTACAATTTCTAGGGCACGTTGGACGGAATCAAGATCAAAACTCAAAAAAGATACTATACGACCTTTAAT TACTCTTTCAGAGCCAAATCCGTACTACATGGTGTCTTTACGCATTGGTACA Aqcoe1729s0001.1

heche-psb commented 9 months ago

Are you using this file Acoerulea_322_v3.1.cds_primaryTranscriptOnly.fa.gz ?

Bio1nform commented 9 months ago

wgd syn -f mRNA -a Name wgd_dmd/Aquilegia_coerulea.tsv Aquilegia_coerulea.gff3 -ks gd_ksd/Aquilegia_coerulea.tsv.ks.tsv -o wgd_sync

I used Acoerulea_322_v3.1.cds.fa.gz (43550). To match to mRNA in .gff3 file (43550).

heche-psb commented 9 months ago

By principle, only one sequence per gene should be used in the construction of whole paranome, that way each split in the tree can represent one gene duplication event. If you use the whole alternative CDS instead of only the primary ones, how do you interpret the biological meaning of each bipartition in the tree?

Bio1nform commented 9 months ago

The issue is the gene names from the Acoerulea_322_v3.1.cds_primaryTranscriptOnly.fa.gz does not match the gff3 files.

Here is the gff3 file:

gff-version 3

annot-version v3.1

species Aquilegia coerulea

Chr_01 phytozomev11 gene 2657 4987 . + . ID=Aqcoe1G000100.v3.1;Name=Aqcoe1G000100;ancestorIdentifier=Aquca_009_00001.v1.1 Chr_01 phytozomev11 mRNA 2657 4987 . + . ID=Aqcoe1G000100.1.v3.1;Name=Aqcoe1G000100.1;pacid=33083967;longest=1;ancestorIdentifier=Aquca_009_00001.1.v1.1;Parent=Aqcoe1G000100.v3.1 Chr_01 phytozomev11 five_prime_UTR 2657 2841 . + . ID=Aqcoe1G000100.1.v3.1.five_prime_UTR.1;Parent=Aqcoe1G000100.1.v3.1;pacid=33083967 Chr_01 phytozomev11 five_prime_UTR 4435 4439 . + . ID=Aqcoe1G000100.1.v3.1.five_prime_UTR.2;Parent=Aqcoe1G000100.1.v3.1;pacid=33083967 Chr_01 phytozomev11 CDS 4440 4691 . + 0 ID=Aqcoe1G000100.1.v3.1.CDS.1;Parent=Aqcoe1G000100.1.v3.1;pacid=33083967 Chr_01 phytozomev11 three_prime_UTR 4692 4987 . + . ID=Aqcoe1G000100.1.v3.1.three_prime_UTR.1;Parent=Aqcoe1G000100.1.v3.1;pacid=33083967 Chr_01 phytozomev11 gene 3331 3855 . + . ID=Aqcoe1G000200.v3.1;Name=Aqcoe1G000200 Chr_01 phytozomev11 mRNA 3331 3855 . + . ID=Aqcoe1G000200.1.v3.1;Name=Aqcoe1G000200.1;pacid=33082500;longest=1;Parent=Aqcoe1G000200.v3.1 Chr_01 phytozomev11 five_prime_UTR 3331 3563 . + . ID=Aqcoe1G000200.1.v3.1.five_prime_UTR.1;Parent=Aqcoe1G000200.1.v3.1;pacid=33082500 Chr_01 phytozomev11 CDS 3564 3812 . + 0 ID=Aqcoe1G000200.1.v3.1.CDS.1;Parent=Aqcoe1G000200.1.v3.1;pacid=33082500 Chr_01 phytozomev11 three_prime_UTR 3813 3855 . + . ID=Aqcoe1G000200.1.v3.1.three_prime_UTR.1;Parent=Aqcoe1G000200.1.v3.1;pacid=33082500

Here are the sequence:

Aqcoe1G000100.1 ATGAACATGGGGGACCCATCTAAACTACATGTTAAGGTCAGATTCTGCCTTGCATCAGAACTCTATTGTTGTGTCGATAC GAGCAAAGGTGCTTTATCTGAACGGCTGGTTTCAATTAAAGAGGAAAGTATGTGCATACTCAAAGATTTTATCACCAAAC ACAATGTTCCCACTGACATCCCTGAAGAACTTTCTGAAGCTTCTGAAGACGATGACGAAGTCTCTGAGAATCCTCCTAAG AAACGAAAATGA Aqcoe1G000200.1 ATGTGTGGCATTGTGTGCGCATTAGGATTCATTCCTTCTGGGGGCACATTACCAGAACATAAATGGTTTTTCGAATTTGA CTCCAGCTCCCACTCTTCTAGCTCAGAAACTAAATTGCTGAGTTTTCTTAAATCTTTGGAGCTCCCTGCATCCTCAATTA GCATTCCACCCAATGGTGGTTGTTGTGTCATAAAAGGAACTTCAGGAGTTGAATGGGAAGCAAATATATTTAATTGTTCA CTTGGTTGA

I need to remove .1 at the end of fasta header. And if there is any name duplication. Wgd wont work. I would have to remove the sequence with duplicated names to run.

heche-psb commented 9 months ago

Duplicated gene names are normally not allowed, since each gene should have a unique name. Could you use -f mRNA and -a Name for extracting the gene names?

Bio1nform commented 9 months ago

I used -f mRNA -a Name it works.

The mRNA number and the gene number does not match. genes (Acoerulea_322_v3.1.cds_primaryTranscriptOnly.fa.gz): 30023 mRNA (Acoerulea_322_v3.1.cds.fa.gz: 43550

In gff3 the genes (30023) and mRNA (43550).

heche-psb commented 9 months ago

OK, but now we know the difference between our results comes from the CDS data we used.

Bio1nform commented 9 months ago

I am getting this error. Can you please take a look into it? I do not know what went wrong? wgd ksd wgd_globalmrbh

/home/.conda/envs/wgd223_38/lib/python3.8/site-packages/Bio/Seq.py:2855: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future. warnings.warn( Traceback (most recent call last): File "/home/.conda/envs/wgd223_38/bin/wgd", line 10, in sys.exit(cli()) File "/home/.conda/envs/wgd223_38/lib/python3.8/site-packages/click/core.py", line 829, in call return self.main(args, kwargs) File "/home/.conda/envs/wgd223_38/lib/python3.8/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/home/.conda/envs/wgd223_38/lib/python3.8/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/.conda/envs/wgd223_38/lib/python3.8/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/.conda/envs/wgd223_38/lib/python3.8/site-packages/click/core.py", line 610, in invoke return callback(args, kwargs) File "/home/.conda/envs/wgd223_38/lib/python3.8/site-packages/cli.py", line 464, in ksd _ksd(kwargs) File "/home/.conda/envs/wgd223_38/lib/python3.8/site-packages/cli.py", line 506, in _ksd multi_sp_plot(df,spair,spgenemap,outdir,onlyrootout,title=prefix,ylabel=ylabel,ksd=True,reweight=reweight,sptree=speciestree,extraparanomeks=extraparanomeks, ap = anchorpoints,plotkde=plotkde,plotapgmm=plotapgmm,plotelmm=plotelmm,components=components,na=True) File "/home/.conda/envs/wgd223_38/lib/python3.8/site-packages/wgd/viz.py", line 638, in multi_sp_plot kde = stats.gaussian_kde(y,weights=w,bw_method=0.1) File "/home/.conda/envs/wgd223_38/lib/python3.8/site-packages/scipy/stats/kde.py", line 193, in init raise ValueError("dataset input should have multiple elements.") ValueError: dataset input should have multiple elements.

heche-psb commented 9 months ago

What is the complete command that you used?

Bio1nform commented 9 months ago

wgd ksd wgd_globalmrbh/global_MRBH.tsv --extraparanomeks wgd_ksd/Aquilegia_coerulea.tsv.ks.tsv -sp speciestree.nw --reweight -o wgd_globalmrbh_ks --spair "Aquilegia_coerulea;Protea_cynaroides" --spair "Aquilegia_coerulea;Vitis_vinifera" --spair "Aquilegia_coerulea;Acorus_americanus" --spair "Aquilegia_coerulea;Aquilegia_coerulea" --plotkde -ap wgd_syn/iadhore-out/anchorpoints.txt

heche-psb commented 9 months ago

It seems you forgot to give the input cds files.

Bio1nform commented 9 months ago

Still the same. I removed .1 from Aqcoe1G000200.1. What do you think is the cause? ValueError: dataset input should have multiple elements.

These are the only outputs. gene_species.map global_MRBH.tsv.ks.tsv

wgd ksd wgd_globalmrbh_G/global_MRBH.tsv --extraparanomeks wgd_ksd_G/Aquilegia_coerulea.tsv.ks.tsv -sp speciestree.nw --reweight -o wgd_globalmrbh_ks_G --spair "Aquilegia_coerulea;Protea_cynaroides" --spair "Aquilegia_coerulea;Vitis_vinifera" --spair "Aquilegia_coerulea;Acorus_americanus" --spair "Aquilegia_coerulea;Aquilegia_coerulea" Aquilegia_coerulea Protea_cynaroides Acorus_americanus Vitis_vinifera --plotkde -ap wgd_sync_G/iadhore-out/anchorpoints.txt

Bio1nform commented 9 months ago

Hi I see that the speciestree somehow affects the figure output. See the arrows in the figure.

(((Acorus_americanus,Aquilegia_coerulea),Protea_cynaroides),Vitis_vinifera); image

The original one (((Vitis_vinifera,Protea_cynaroides),Aquilegia_coerulea),Acorus_americanus); image

How did you make these speciestree file? Was it external source, if so what genes input did you use to create the file?

Thanks

heche-psb commented 9 months ago

The relationship of outgroup and ingroup species determines the result of substitution rate correction. When you change the species tree which alters such relationship, the result will change. I followed APG IV.

Bio1nform commented 9 months ago

Sorry for the naive question, how did you get the tree file? Do i download from APG IV? APG IV tree is huge tree.

Angiosperm_Phylogeny_Poster_1500px

Thanks

heche-psb commented 9 months ago

I used the updated information in the APG IV web

Bio1nform commented 9 months ago

wgd ksd wgd_globalmrbh/global_MRBH.tsv --extraparanomeks wgd_ksd/Hap1.tsv.ks.tsv -sp speciestree.nw --reweight -o wgd_globalmrbh_ks3 --spair "Hap1;Malus_domestica" --spair "Hap1;Araport_thalania11" --spair "Hap1;Vitis_vinifera" --spair "Hap1;Oryza_sativaJ" --spair "Hap1;Hap1" Hap1 Malus_domestica Araport_thalania11 Vitis_vinifera Oryza_sativaJ --plotkde -ap wgd_syn/iadhore-out/anchorpoints.txt

image

I cannot see the other peaks, what could be the reason? Thanks