uclahs-cds / package-CancerEvolutionVisualization

Publication Quality Phylogenetic Tree Plots
https://cran.r-project.org/web/packages/CancerEvolutionVisualization/
GNU General Public License v2.0
2 stars 0 forks source link

Specify edge text location #84

Open dan-knight opened 10 months ago

dan-knight commented 10 months ago

Suggested by @WuSelina, there is a need to specify precise locations of text along an edge. Broadly, this has two requirements:

Screenshot 2023-08-17 at 1 22 13 PM

dan-knight commented 10 months ago

Edges and nodes can both be referenced by the same ID. Just like with the length columns in the tree input data.frame, the specified edge would correspond to the edge leading from the parent to the specified node.

This should probably be handled with extra, optional columns in the text input data.frame. Tentatively: node.id location x y

The x and y values are a bit tricky.

These seem intuitive for each argument separately, but I'm hesitant to design this in a way that x and y aren't conceptualized the same way. Maybe this can be solved with better, more informative column names.

WuSelina commented 10 months ago
  • I'm thinking that y should be a percentage of the total branch length. This is due to the fact that a branch could have multiple edges, and therefore multiple scales. An objective value would not be practical. (Which scale applies to the value?)

I think y being a percentage of the total branch length makes sense. I guess a better column name could be similar to label.y just in case y is ambiguous.

I'm not sure what you mean by an "objective value". I believe you are correct and thoughtful that it would not be practical if the scales are not clear, but I am having a hard time following along.

WuSelina commented 10 months ago

For x, perhaps this should work the same as the tree padding parameters, corresponding to a percentage of the default padding.

I'm not sure if others such as @whelena will use these labels in a different way when there are multiple samples, but I think it makes sense to set x (or label.x as I'll call it for now) as corresponding to padding, but I think I understand your concern.

These seem intuitive for each argument separately, but I'm hesitant to design this in a way that x and y aren't conceptualized the same way. Maybe this can be solved with better, more informative column names.

For my single-sample cases so far, I think of the label.y as "the point in time at which an event occurs," so I believe setting it as percentage of branch length would work well. For label.x, though, I think of it as asking "Which clone does this event stem from/occur to?" so I am asking which branch/node I want the event label to be closest to. We could probably talk more about this in person!

whelena commented 10 months ago

I'm not sure if others such as @whelena will use these labels in a different way when there are multiple samples, but I think it makes sense to set x (or label.x as I'll call it for now) as corresponding to padding, but I think I understand your concern.

For labelling I don't envision doing it this granularly. For some of my samples, placing the labels are simply impossible with the current setup due to lack of space.

I'm thinking that y should be a percentage of the total branch length. This is due to the fact that a branch could have multiple edges, and therefore multiple scales. An objective value would not be practical. (Which scale applies to the value?) - For x, perhaps this should work the same as the tree padding parameters, corresponding to a percentage of the default padding

I think a percentage would be equally confusing since the actual branch lengths are determined within CEV. I think we should simplify the problem for now and default to the scale used in the first edge, length1, for y. In @WuSelina case, both length1 and y should be the number of SNV between events.

As for the x value, is there a reason why the default wouldn't work? Is it mostly to control the distance between the edges and the labels?