veg / hyphy

HyPhy: Hypothesis testing using Phylogenies
http://www.hyphy.org
Other
214 stars 69 forks source link

Format for selecting branches to test on FEL #1734

Open andrearam28 opened 2 months ago

andrearam28 commented 2 months ago

Hello,

I am trying to run contrast-fel on HPC using this command:

hyphy fel --alignment nt.fasta --tree tree.nexus --branches --output nt.json

The tree is a tree rescaled in time with ancestral sequences reconstructed. Including ancestral sequences, let's say there are 500 names, of which I would like to use 200 of those to use as test set. However, I have tried just writing the branch/node names as:

Branch1 Branch2 Node001 Node003 .... or Branch1, Branch2, Node001, Node003, ....

Everytime I get the following error:

Error:'Branch1,' is not a valid choice passed to 'Choose the set of branches to test for selection' ChoiceList using redirected stdin input or keyword arguments. Valid choices are All, Internal, Leaves, Unlabeled branches in call to ChoiceList(testSet, "Choose the set of branches to test for selection", 1, NO_SKIP, selectTheseForTesting); 'Branch1,' is not a valid choice passed to 'Choose the set of branches to test for selection' ChoiceList using redirected stdin input or keyword arguments. Valid choices are All, Internal, Leaves, Unlabeled branches in call to ChoiceList(testSet, "Choose the set of branches to test for selection", 1, NO_SKIP, selectTheseForTesting);

So, my question is what format should I provide my branches of interest to fel?

Thanks! Andrea

liamxg commented 2 months ago

Dear @andrearam28,

you need label the branches in the tree.

andrearam28 commented 2 months ago

Hi @liamxg,

Thank you for your response.

I believe I do have all tips and nodes labelled, I can see them in the tree and when I open the tree in a program like Figtree.

I have tried other options and I am able to run contrast-fel with --branch-set "Random set of branches" for example. In the log I am able to see that internal nodes where randomly picked for this analysis, also. However, I am encountering issues regarding how to pick the specific branches I would like to use as test. This is where I get the error message that even one tip label "is not a valid choice passed to 'Choose sets of branches to compare."

Is there a specific format for this? I haven't been able to found it online and contrast-fel --help doesn't specify it either. Maybe something like a .txt file with all the branch names?

Thanks! Andrea

liamxg commented 2 months ago

Dear @andrearam28,

you can do this in http://phylotree.hyphy.org/#.

andrearam28 commented 2 months ago

Hi @liamxg,

Thank you again for your response!

Is there a faster way to do this? If I need to run this test on 1,000 different sets of >300 sequences and pick 70-100 branches, I’m sure you can agree that this might not be the most efficient approach. I have the names of the branches I want to use as test. A faster/more efficient way would be extremely helpful!

Thanks again, Andrea

agselberg commented 2 months ago

Hi Andrea, You can label your tree with LabelTree.bf from this directory. Just clone the hyphy-analyses directory. To choose your test species you can list them in a text file as shown in the example hyphy-analyses/LabelTrees/data/data/list.txt. Then your command will look something like this:

hyphy ~/hyphy-analyses/LabelTrees/label-tree.bf --tree unlabeled.nwk --list list.txt --label "Test" --output labeled-parsimony.nwk --internal-nodes "Parsimony"

Then when you run FEL your branch name should match the label you chose (in this case Test)

hyphy fel --alignment nt.fasta --tree labeled-parsimony.nwk --branches "Test" --output nt.json

Let me know if you have any other questions.

Best, Avery

liamxg commented 2 months ago

Dear @agselberg,

Thanks for your help.

BTW, how to only label the internal nodes not the tips?

agselberg commented 2 months ago

Liam,

You can use label-mrca.bf to label the most recent common ancestors of a given pair of taxa. See the readme with an example here.

Alternatively you can use LabelTree.bf to label the branches you don't want (in this case the tips of the tree) and use "Unlabeled branches" in your command. For example with BUSTED:

hyphy BUSTED --alignment positive-test.fasta --tree positive-test.nwk --branches "Unlabeled branches" --output practice.json

Best, Avery

liamxg commented 2 months ago

Dear @agselberg,

Thanks again!

do you mean using the following command:


hyphy label-tree.bf --tree data/unlabeled.nwk  --list data/list.txt --output only.nwk --internal-nodes None

and then using:

hyphy BUSTED --alignment positive-test.fasta --tree positive-test.nwk --branches "Unlabeled branches" --output practice.json

If so, this will test with BUSTED for all of the internal branches, right? But I want to label some clusters of internal branches as Alpha, some internal branches as Delta and the other internal branches as background?

If I want to label like below, what should I do?

BTW, If I want to label only the internal branches of each Alpha, Delta and background, what should I do, thanks.

image
agselberg commented 2 months ago

Liam,

Yes, that would test the unlabeled (internal) branches like you said.

What method are you using in your analysis? BUSTED-PH?

In my own BUSTED-PH pipeline I labeled a tree with three branch sets- FOREGROUND (alpha), NUISANCE (delta), and BACKGROUND. I chose to label the foreground and nuisance branches with LabelTree.bf and for the background branches I used a short python script to label everything else. You could do the same with your tree for alpha, delta, and label all other branches as background. See the full pipeline in the snakefile.

You could label the alpha clade hyphy hyphy-analyses/LabelTrees/LabelTree.bf --tree {input.tree} --list {input.partition_file} --output {output.output} --internal-nodes 'Parsimony' --label Alpha Then the delta clades hyphy hyphy-analyses/LabelTrees/LabelTree.bf --tree {input.tree} --list {input.partition_file} --output {output.output} --internal-nodes 'Parsimony' --label Delta Then label the rest as background python tree-remaining-bg.py {input.tree} {output.output} I used the python script here.

Best, Avery

liamxg commented 2 months ago

Avery,

Thanks again for your kindly reply and help. I know how to label the rest as background.

I want to use relax, contrast-fel and BUSTED-PH for my analysis.

For relax method, if FOREGROUND is Alpha, NUISANCE is Delta, and BACKGROUND

should I set --test Alpha --reference Delta or --test Alpha --reference Delta,BACKGROUND

For contrast-fel method, if FOREGROUND is Alpha, NUISANCE is Delta, and BACKGROUND

should I set --branch-set Delta --branch-set Alpha or --branch-set Delta --branch-set Alpha --branch-set BACKGROUND

For BUSTED-PH method, if FOREGROUND is Alpha, NUISANCE is Delta, and BACKGROUND

your parameters are: {HYPHY} {BUSTED_PH_BF} --alignment {input.seq} --tree {input.tree} --srv Yes --starting-points 10 --branches FOREGROUND --comparison BACKGROUND --output {output.json} ENV='TOLERATE_NUMERICAL_ERRORS=1' why there is no information about NUISANCE?

Hope you can help me out, thanks.

Best, Liam

agselberg commented 2 months ago

Liam, In busted-ph you can only compare one branch set to one other branch set, but extra "nuisance" branches are allowed to keep the tree topology. In your study you may have to repeat busted-ph if you want to do multiple comparisons (alpha vs delta, alpha vs background, delta vs background).

While busted-ph does not require that you label the nuisance branches, in my pipeline I chose to first label everything except the background so I could label remaining branches as background. However as long as you have a labeled tree that matches the --branches and --comparison labels, the method you use to label your tree is not important.

Additionally I'm not sure contrast-fel allows for the extra nuisance branches, you may have to prune down your tree to leave only two sets of branches for that analysis.

Best, Avery

liamxg commented 2 months ago

Avery, Thanks. That's really helpful to me.

github-actions[bot] commented 1 day ago

Stale issue message