kgori / treeCl

Clustering phylogenetic trees with python
MIT License
25 stars 12 forks source link

Loading pre-computed trees #27

Closed SashaNikolaevaBerkeley closed 1 year ago

SashaNikolaevaBerkeley commented 1 year ago

I've been trying to calculate distances on my precomputed with Raxml-ng trees, but I keep getting this error:

raise AttributeError('No tree') AttributeError: No tree

However, it seems that my trees have been loaded since I am seeing this feedback:

Loading files: 1324 of 1324 |############################################################################| Elapsed Time: 0:00:00 Time: 0:00:00 Loading parameters: 100% |###############################################################################| Elapsed Time: 0:00:12 Time: 0:00:12

Does that mean that my trees are in a different format than what treeCl expects? How can I troubleshoot this?

Thanks!

kgori commented 1 year ago

Hi Sasha,

I probably need to add a new parser for raxml-ng trees. Do you think you could share an example of the trees you are using?

Thanks, Kevin

SashaNikolaevaBerkeley commented 1 year ago

Hi Kevin,

Thank you for getting back to me so quickly. I am attaching the dataset with a few trees. I also tried to use the TreeDist package as it seems to have similar functionality, but it doesn't seem to be particularly good at determining the number of clusters. So I am hoping that your program will work.

Best, Sasha

Alexandra Sasha Nikolaeva Master of Forestry Ph.D. Candidate Department of Environmental Science, Policy and Management UC Berkeley

On Mon, Mar 6, 2023 at 1:47 AM Kevin Gori @.***> wrote:

Hi Sasha,

I probably need to add a new parser for raxml-ng trees. Do you think you could share an example of the trees you are using?

Thanks, Kevin

— Reply to this email directly, view it on GitHub https://github.com/kgori/treeCl/issues/27#issuecomment-1455805685, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF7R6HOEVATQUYTLMH5RAC3W2WXBTANCNFSM6AAAAAAVP3DKGA . You are receiving this because you authored the thread.Message ID: @.***>

kgori commented 1 year ago

Hi Sasha,

I realised that raxml-ng outputs trees as plain Newick files, so no special parsing is needed. However, treeCl is a bit inflexible about how it wants the files to be named. If you name your tree files so that they match the alignment files, everything should load as expected. In general the pattern needs to be alignment = NAME.{phy,fas}, tree = NAME.nwk. For example, if your alignments are

cytb.phy
atp6.phy
rbcl.phy

Then your tree files need to be called

cytb.nwk
atp6.nwk
rbcl.nwk

Then you can load everything together using treeCl.Collection(input_dir = "path/to/alignments", trees_dir = "path/to/trees")

Hope this helps. Feel free to get back in touch if there are any problems. Best, Kevin

SashaNikolaevaBerkeley commented 1 year ago

Hi Kevin,

Great, thank you so much! I will try today and get back to you if I have any questions.

Best, Sasha

Alexandra Sasha Nikolaeva Master of Forestry Ph.D. Candidate Department of Environmental Science, Policy and Management UC Berkeley

On Thu, Mar 9, 2023 at 2:38 AM Kevin Gori @.***> wrote:

Hi Sasha,

I realised that raxml-ng outputs trees as plain Newick files, so no special parsing is needed. However, treeCl is a bit inflexible about how it wants the files to be named. If you name your tree files so that they match the alignment files, everything should load as expected. In general the pattern needs to be alignment = NAME.{phy,fas}, tree = NAME.nwk. For example, if your alignments are

cytb.phy atp6.phy rbcl.phy

Then your tree files need to be called

cytb.nwk atp6.nwk rbcl.nwk

Then you can load everything together using treeCl.Collection(input_dir = "path/to/alignments", trees_dir = "path/to/trees")

Hope this helps. Feel free to get back in touch if there are any problems. Best, Kevin

— Reply to this email directly, view it on GitHub https://github.com/kgori/treeCl/issues/27#issuecomment-1461768717, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF7R6HJPEMDUFVKWOECYUJ3W3GXMFANCNFSM6AAAAAAVP3DKGA . You are receiving this because you authored the thread.Message ID: @.***>