heche-psb / wgd

wgd v2: a suite of tools to uncover and date ancient polyploidy and whole-genome duplication
https://wgdv2.readthedocs.io/en/latest/
GNU General Public License v3.0
21 stars 0 forks source link

Suggestions on the result #14

Closed manoharbisht1998 closed 2 months ago

manoharbisht1998 commented 9 months ago

Thank you for the convenient tool!

I have successfully performed the analysis, however, I am a bit confused about the result interpretation and final presentation of the result in a standard way.

I followed these commands

  1. wgd dmd --globalmrbh SPECIES_cds Zea_mays_cds Amborella_trichopoda_cds Musa_acuminata_cds --cds -n 90
  2. wgd ksd wgd_dmd/global_MRBH.tsv --extraparanomeks ../wgd_ksd/SPECIES_cds.tsv.ks.tsv -sp speciestree.nw -o wgd_globalmrbh_ks --spair "SPECIES_cds;Musa_acuminata_cds" --spair "SPECIES_cds;Amborella_trichopoda_cds" --spair "SPECIES_cds;Zea_mays_cds" --spair "SPECIES_cds;SPECIES_cds" --reweight --plotkde
  3. wgd viz -d wgd_globalmrbh_ks/global_MRBH.tsv.ks.tsv -sp speciestree.nw --extraparanomeks ../wgd_ksd/SPECIES_cds.tsv.ks.tsv --spair "SPECIES_cds;Musa_acuminata_cds" --spair "SPECIES_cds;Amborella_trichopoda_cds" --spair "SPECIES_cds;Zea_mays_cds" --spair "SPECIES_cds;SPECIES_cds" --reweight --plotkde

the results from 2nd and 3rd are attached

I wanted to know why I am not getting the SPECIES_CDS paranome in the 2nd figure (SPECIES_cds_Corrected.ksd.averaged.pdf)? and can we use this 2nd figure to infer that SPECIES_CDS and Musa_acuminata_cds shared the same WGD event which happened after the divergence of SPECIES_CDS with Zea mays and Amborella?

Thanks SPECIES_cds_Corrected.ksd.weighted.pdf SPECIES_cds_Corrected.ksd.averaged.pdf

heche-psb commented 9 months ago

I guess you didn't get SPECIES_CDS paranome in the 2nd figure is because the safe gene id in the Ks file is not in accordance with the file name "SPECIES_CDS". Could you make sure that for instance, your file name is "SPECIES_CDS" and the safe gene id in your Ks file is like "SPECIES_CDS_0/1/2/.."

manoharbisht1998 commented 9 months ago

Thank you, I will check that. However, by looking at the figure I conclude that the Y-axis is not scalable. As the paranome of SPECIES_CDS has high homologous pair value as compared to species pair orthologs. Isnt it? Further, by means of "safe gene ids" you meant my gene ids should have the initials as the name of the file? like for Musa_acuminata_cds file the gene ids should be >Musa_acuminata_cds_pt00012 ?

heche-psb commented 9 months ago

I have already changed the y limit to be 1.1*max height of histogram in this repository. But it's hard to tell whether this change is better than the original one or not because it might truncate the fitted curve. You may give a try. No, your original gene ids can be any shape. The safe gene ids are inherently produced by the program itself. Issues might emerge when you infer ksd using file name "SPECIES_CDS" and then do other analysis using a new file name "SPECIES_CDS1/2" or etc. My point is to highlight that it's better to keep your cds file name always unchanged in all the analysis.

manoharbisht1998 commented 9 months ago

Actually I have tried to limit the Y-axis, but it truncated the plot. Further, regarding the above-mentioned query, can we use this 2nd figure to infer that SPECIES_CDS and Musa_acuminata_cds shared the same WGD event that happened after the divergence of SPECIES_CDS with Zea mays and Amborella?

heche-psb commented 9 months ago

Both the node weighted and averaged Ks results can be used to shed lights on the placement of WGD. The 2nd figure seems to have no infomation about the WGD peak. If the WGD peak is indisputably older than 1.17, you can for sure claim that SPECIES_CDS and Musa_acuminata_cds shared one WGD.

manoharbisht1998 commented 9 months ago

Thank you for the clarification, I have observed through the plotting the ks distribution of paranome of SPECIES_CDS and its ks value is 0.5. which is younger to SPECIES_CDS and Musa_acuminata_cds ortholog pair. does it mean that SPECIES_CDS has suffered a WGD which is not shared by its closely related Musa_acuminata_cds?

Thanks

heche-psb commented 9 months ago

Yes.

manoharbisht1998 commented 9 months ago

is there a way to convert homologous pairs plot to density plot?

heche-psb commented 9 months ago

Yes, but I'm not sure if many users really need it or not.

manoharbisht1998 commented 9 months ago

Could you please add it as an option, beacuse in the case of my species the homologous pairs scale is way too high, which might also be the case for others also.

Thanks

manoharbisht1998 commented 9 months ago

Hi any update on the density plot?

heche-psb commented 8 months ago

I implemented two more options in wgd viz, --adjustortho (default False) and --adjustfactor (default 0.5), with which you can adjust the height of orthologous Ks distribution relatively to the height of paralogous Ks distribution according to the ratio of respective highest bar. Simply transforming into density can not aviod the issue of equal scalablility because there will be very high dense regions going far beyond others unless standardization to a specific scale, for instance whole paranome, which in turn will be no difference to just implementing the adjustment above. Thus, I prefer to let users adapt the relative height on their own with the two options. Another minor new option --okalpha (default 0.5) could set the opacity of orthologous Ks distribution in mixed plot.

manoharbisht1998 commented 8 months ago

Thanks for the update but on running with these parameters its showing this error ValueError: array must not contain infs or NaNs

heche-psb commented 8 months ago

Can you share the full command, log and perhaps data?