uclahs-cds / package-CancerEvolutionVisualization

Publication Quality Phylogenetic Tree Plots
https://cran.r-project.org/web/packages/CancerEvolutionVisualization/
GNU General Public License v2.0
2 stars 0 forks source link

Dendrogram style trees #114

Open dan-knight opened 9 months ago

dan-knight commented 9 months ago

Allow users to create dendrograms rather than radial trees. This should allow plotting larger trees, with many more nodes.

dan-knight commented 9 months ago

In dendrogram mode, the length of each branch should correspond to the "vertical" length of the branch.

dan-knight commented 9 months ago

Which visual style is preferred? Is one or the other strictly correct? From some brief research with @whelena, option A seems to be more prevalent in the field. It would also be simpler to implement.

Option A

Screenshot 2024-01-05 at 2 56 13 PM

Option B

Screenshot 2024-01-05 at 2 56 02 PM
dan-knight commented 8 months ago

I don't think it makes sense to allow a node's children to be a mix of dendrogram and radial modes. Despite the similarities, there are fundamental differences that make them incompatible. I think that the best way to ensure that valid trees are drawn is to set the dendrogram/radial mode on each parent node, applying that setting to all children.

Consider the following example:

node.id parent mode
1 NA R
2 1 D
3 1 R
4 2 R
5 2 R
6 2 D
7 2 R
8 4 R
9 4 R
10 6 R
11 6 R
12 6 R

In this case, only nodes 2 and 6 are set to dendrogram mode. However, this means that all nodes from 4 to 7 (children of 2) and 10 to 12 (children of 6) will be drawn in dendrogram mode. All others are drawn normally in radial mode. Note that the mode setting on the nodes themselves is not relevant to the way they are drawn - only their parent.

whelena commented 8 months ago

Which visual style is preferred? Is one or the other strictly correct? From some brief research with @whelena, option A seems to be more prevalent in the field. It would also be simpler to implement.

I think option A follows common interpretation of a dendrogram where intersections = common ancestor and option B implies that the children share some common mutations after they supposedly diverged from the parent

In dendrogram mode, the length of each branch should correspond to the "vertical" length of the branch.

Another discussion we had was whether the lengths should account for the radius of the node labels. I prefer not, since it can potentially result in confusion especially in dendrogram mode where equidistant children ends up with different vertical distance since the node pointing straight down will have to account for the radius of the parent node label while the nodes that comes out from the side does not (this is easier to explain with a figure).

On the other hand, the radial mode accounts for node radius and internal consistency is important. If there is no strong reason other than legacy code I would change this too.

I guess a potential reason to account for node radius is to avoid label clashing when distance < node diameter. This should be fixable through scaling/transforming the distances.