veg / hyphy

HyPhy: Hypothesis testing using Phylogenies
http://www.hyphy.org
Other
201 stars 68 forks source link

HyPhy-GUI desktop application - analyses with >2000 sequences #1604

Closed vanessacpmcorreia closed 1 year ago

vanessacpmcorreia commented 1 year ago

Hi,

I need to perform several analyses on a aligment with more than 2000 sequences, and thereby I need to submit a prebuilt tree. In this context, how to I combine both aligment and tree (nexus) in a same file to submit them together?

Thanks, Vanessa

spond commented 1 year ago

Dear @vanessacpmcorreia,

The HyPhy GUI application has not been supported or maintained for a while. Generally, you would just include your tree as a part of the alignment file, see http://hyphy.org/tutorials/CL-prompt-tutorial/#preparing-input-data-for-hyphy

Best, Sergei

vanessacpmcorreia commented 1 year ago

Hi Sergei,

Thanks! I moved for datamonkey webserver. However, when I try to run the analysis (FEL) it says that the aligment should include a prebuilt tree. For preparing the single file needed, I include the Newick tree as part of the fasta aligment file, indicating >Tree before the topology:

Tree(xxx); Is there other tree-recognition string for datamonkey?

Thanks Vanessa

spond commented 1 year ago

Dear @vanessacpmcorreia,

Might be easiest if you could include your alignment + tree file as an attachment for me to look at.

Best, Sergei

vanessacpmcorreia commented 1 year ago

Sergei,

Thanks! align_tree.zip

spond commented 1 year ago

Dear @vanessacpmcorreia,

Adding the tree to the file at the end should do the trick. However, you will run into errors on Datamonkey because of nonalphanumerics in your sequence names.

>B/Aberystwyth/7974/2018/1-843 | B / H0N0 | Yamagata | 2018-01-17 | Public Health Wales Microbiology Cardiff | 1297300

Spaces, |, and / can sometimes lead to issues with HyPhy matching sequences with tree leaves, especially since the tree you have "massaged" the names somewhat like so (replacing spaces with - among other things).

B/Hong/1-846-Kong/557/2000-B-/-H0N0-Yamagata-2000-10-04-51203

My suggestion would be to sanitize sequence names. HyPhy has a cln command that will do that

hyphy cln 

follow prompts 

The resulting file will have sequence names that are maximally conforming with HyPhy expectations

>B_Aberystwyth_7974_2018_1_843_B_H0N0_Yamagata_2018_01_17_Public_Health_Wales_Microbiology_Cardiff_1297300
....

(@stevenweaver : it would be nice to update the DM validation script to handle stuff like this at some point).

Best, Sergei

vanessacpmcorreia commented 1 year ago

Hi Sergei,

I have run the cln command in hyphy, and the sequence names appear altered conforming Hyphy expections. However, I don´t understand how I can get the file with the sequence names altered. It goes straighforward for the analysis of filtering duplicates and gaps. Thanks! Vanessa Captura

spond commented 1 year ago

Dear @vanessacpmcorreia,

After you choose the options for filtering and gap handling, the modified file will be written out to a path that you specify.

Best, Sergei

vanessacpmcorreia commented 1 year ago

Hi @spond,

I specified the path, but the modified file is not appearing in the fold. Command used: hyphy cln Universal /users/vanessacorreia/desktop/teste/teste.fas "No/No" /users/vanessacorreia/desktop/teste/teste1.fas

What could be the error?

Captura de ecrã 2023-05-23, às 15 05 07 Thanks, Vanessa

github-actions[bot] commented 1 year ago

Stale issue message