theosanderson / taxonium

A tool for exploring very large trees in the browser
http://taxonium.org
GNU General Public License v3.0
98 stars 16 forks source link

Branch thickness / colour controlled by numerical property #496

Open theosanderson opened 1 year ago

theosanderson commented 1 year ago

Requested by @NicolaDM

theosanderson commented 1 year ago

A related issue is that we don't currently parse branch / node annotations from Nexus files (https://github.com/theosanderson/taxonium/issues/461). Nexus is a basic implementation that just reads the tree. But if one wanted to implement something like this without dealing with that yet, branch metadata can be provided in TSV format, including for internal nodes so you could have

Node_id PlacementScore
abc 233

This should end up with that property attached to the node as "node.meta_PlacementScore" (probably as a string)

Theoretically it shouldn't be too hard to adapt the layers to use that metadata item, by changing e.g. getColor in https://github.com/theosanderson/taxonium/blob/f971e4742888189e74195aa41583989b07f0c3a7/taxonium_component/src/hooks/useLayers.jsx#L177 to be a function that is dependent on d.meta_PlacementScore. Then we'd need to implement this into the config so that one could specify whether to use this or not.

theosanderson commented 1 year ago

The initial Nexus thing is hopefully resolved

amkram commented 1 year ago

Hey @theosanderson @NicolaDM,

I've been working on a draft of some changes that I think are related to this discussion, specifically for analyzing nodes with placement ambiguity.

Wanted to share it to hear both of your feedback/suggestions. It's pretty specific to this use case and could maybe be generalized for other metadata field types before adding to taxonium?

I added some logic specifically when a field called "uncertainty" is present in a metadata file.

The uncertainty field is expected to be a list of other potential placements of nodes, with a probability attached to each. E.g., "A:0.4,B:0.1".

When color by uncertainty is selected, I highlight the alternate nodes for the currently selected node and the branches connecting them. Right now this adds a circle around each of the nodes with radius changing based on probability, but I think there may be a better way to display the probabilities.

Example (with "USA/CA/CZB-3702/..." selected). The probabilities are all equal in this case.

Screenshot 2023-07-18 at 5 35 45 PM

This uses these test Newick and metadata files (note the metadata is very incomplete as adding the ambiguities increases the file size a lot):

The PR on my fork is here with a live demo here

Let me know what you think!

theosanderson commented 1 year ago

Hi @amkram, sorry have been on holiday and didn't spot this until reading through emails in detail today. Looking now.

theosanderson commented 1 year ago

Thanks for this. I think @nicolaDM will definitely be interested!

I think it looks really cool. When Nicola asked about similar stuff I demurred on the basis of trying to avodi the need to maintain a relatively niche feature into the future. This changes things in that (A) I don't have to do the initial implementation (B) there are at least 2 users. I'm still not 100% sure about that.

At the moment, I got quite a few crashes in the demo with this output:

index-d65f2e44.js:40 TypeError: ct is not a function or its return value is not iterable
    at taxonium-component.umd-fb631ffa.js:1761:22555
    at Object.Vp [as useMemo] (index-d65f2e44.js:38:23880)
    at V.useMemo (index-d65f2e44.js:9:6179)
    at useLayers (taxonium-component.umd-fb631ffa.js:1761:22187)
    at Deck (taxonium-component.umd-fb631ffa.js:2581:3571)
    at us (index-d65f2e44.js:38:19503)
    at Qa (index-d65f2e44.js:40:3135)
    at wv (index-d65f2e44.js:40:44651)
    at mv (index-d65f2e44.js:40:39658)
    at cg (index-d65f2e44.js:40:39586)

If we implemented this do you think it would work best stored in Taxonium JSONLs with a structure like [[node_id (numeric), score],[node_id,score]] or something to reduce file size?

NicolaDM commented 1 year ago

@amkram thank you so much, this looks really cool! I just finished making MAPLE output tree and metadata conforming to the required format, and it seems to work without problems.

I think the idea of highlighting the path connecting the different possible placement nodes is really great! I was wondering if you think it would be possible/worth something like making the thickness and/or color shade of different parts of the highlighte path proportional to the support value.

Nicola

kbseah commented 5 months ago

hello, chiming in here to say that I would also be interested to use Taxonium to display phylogenetic placements, as branch thicknesses or symbols attached to branches