mrmckain / PhyDS

Phylogenetic iDentification of Subgenomes
10 stars 2 forks source link

estimating paralogs with script #4

Open goeckeritz opened 2 years ago

goeckeritz commented 2 years ago

Hi Michael,

It's me again, back from the dead (sorry!). I am trying out your estimate_paralogs_from_trees.pl script after all since I think my strategy in a previous post was erroneously reporting some genes twice. I had a few questions about this script and how it works. I'm testing it on only 5 trees. My ultimate goal here is to identify when a subA, subA, or subB ortholog is a direct sister to fruticosa or sweet cherry, with apple and peach as outgroups. In other words, when I run phyDS I want it to check for every subA, subA, and subB ortholog and report the relationships when they are sister to a sweet cherry or fruticosa ortholog. But first things first, I need to get my paralogs list in the correct format. Which I seem to be severely confused about still.

What [I think] I'd like to have then is every possible combo of subA x subB and subA x subA and subA x subB combo and tell phyDS to --ignore apple, peach, subA, subA__, and subB so it only reports when these paralogs are sister to fruticosa or sweet cherry. Am I understanding that right?

But I can't seem to get the expected paralogs list from your script. I do:

perl estimate_paralogs_from_trees.pl --trees /directory --outgroups apple, peach --name prefix

and the output has what appears to be every combo imaginable for the genes regardless of what I specify the outgroups to be - is that normal? Should I just take lines from this file containing subA, subA___, and subB for my paralogs list?

A second question -- why is there a 3rd column in the output? Every value I have for it is unknown.

Thanks in advance for your time - I really appreciate it!

Kindly, Charity

mrmckain commented 2 years ago

Hi Charity,

The estimate_paralogs_from_trees.pl script is doing as intended. This script was lifted from my PUG program, so it has options that are not used in this script. Originally, the outgroups option did nothing. I just added some lines to the script that will allow you to use the outgroups to skip those taxa as when creating putative paralogs. You can also just take the ones you are interested in (based on taxa). Just remember that these are simply all the possible pairs of tips for your taxa (or subgenomes) without any other consideration. They are just meant to be input for PhyDS.

The third column is a hold over from PUG. It does nothing for PhyDS. In PUG, you could have paralogs from different events (you predefine), so you can see where those paralogs map to on a phylogeny. This was useful if you had subsequent events in your phylogeny--for example, rho and sigma in Poales.

Let me know if you have more questions.

Best, Michael