heche-psb / wgd

wgd v2: a suite of tools to uncover and date ancient polyploidy and whole-genome duplication
https://wgdv2.readthedocs.io/en/latest/
GNU General Public License v3.0
21 stars 0 forks source link

weird results using test data in WGD ksd analysis #17

Closed HyeonseonPark closed 2 months ago

HyeonseonPark commented 6 months ago

Hello, I'm trying to analyze WGD event using your software.

I set the software using pip on Conda environment with python version 3.8 and numpy version 1.19.0. Using ugi1000.fasta in test/data directory, I performed wgd dmd and wgd ksd. However, I got different format of ks.tsv file from ath.ks.tsv.

스크린샷(53)

What are problems in my works?

heche-psb commented 6 months ago

Hi, could you share me with your full command? Thanks

HyeonseonPark commented 6 months ago

I followed the commands on recipes at wgd v2 readdocs. I want to infer KS distribution construction by WGD event and estimate species divergence time. So, I performed the analyses by following commands: wgd dmd ugi1000.fasta -n 100 wgd ksd wgd_dmd/ugi1000.fasta.tsv ugi1000.fasta

Did i perform wrong way to analyze?

heche-psb commented 6 months ago

Hi, your commands are correct. The format of ath.ks.tsv in the test/data is from wgd v1 actually, which I will update with the v2 format later. I just fixed a bug pertaining to wgd ksd. Could you please git pull the latest commit and try again? You should be able to get a Ks result file like this:

$head -n 3 ugi1000.fasta.tsv.ks.tsv
pair    N       S       alignmentcoverage       alignmentidentity       alignmentlength dN      dN/dS   dS      family  g1      g2      l      node     node_averaged_dS_outlierexcluded        node_averaged_dS_outlierincluded        strippedalignmentlength t       weightoutlierexcluded  weightoutlierincluded    gene1   gene2
UGI.ctg12048.24868.1__UGI.ctg12712.24918.1      NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     GF00000001      ugi1000.fasta_00187     ugi1000.fasta_00237     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     UGI.ctg12048.24868.1    UGI.ctg12712.24918.1
UGI.ctg12048.24868.1__UGI.ctg12729.24921.1      NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     GF00000001      ugi1000.fasta_00187     ugi1000.fasta_00240     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     UGI.ctg12048.24868.1    UGI.ctg12729.24921.1
HyeonseonPark commented 6 months ago

I re-clone wgd v2 on my computer. However, the results is same output format. I attached log file and result file.

wgd_ksd.zip

heche-psb commented 5 months ago

I'm not able to reproduce your error. My running log and result file are as attached. test_ksd.log ugi1000 fasta tsv ksd

vidsvur commented 4 months ago

Hello, I face the same error as HyeonseonPark- I git pulled the latest version of wgd, but the output format is still

pair    family  g1  g2  gene1   gene2
Pa1_05814__Pa1_18858    GF00000001  pafricana.cds_formatted.fa_05990    pafricana.cds_formatted.fa_03210    Pa1_05814   Pa1_18858
Pa1_05814__Pa1_10331    GF00000001  pafricana.cds_formatted.fa_05990    pafricana.cds_formatted.fa_06436    Pa1_05814   Pa1_10331

note- I attached an example from my cds file, not the test data

heche-psb commented 4 months ago

A quick check, are you using PAML v4.9j? Other version will probably incur the error of "WARNING No codeml result for GF00000030 due to no resolved nucleotides". Could you please install PAML v4.9j and try again?