uclahs-cds / package-CancerEvolutionVisualization

Publication Quality Phylogenetic Tree Plots
https://cran.r-project.org/web/packages/CancerEvolutionVisualization/
GNU General Public License v2.0
2 stars 0 forks source link

Mismatched node colours #74

Closed whelena closed 9 months ago

whelena commented 1 year ago

It's unclear from the documentation that the node.col parameter is meant to specify a single colour for all nodes and for each node to have a different colour, it should be passed as a column in the tree dataframe. Additionally, there is acolour.scheme parameter that contains a default but might contribute to this issue.

Passing the colours as a column in the tree dataframe gives the default grey nodes. This might be due to an inconsistency in column names 'colour' and 'color'. The prep.tree function (line 73) generates a color column containing the SRCgrob colour.scheme parameter. Then in make.clone.tree.grobs line 64, it checks for a colour column, which defaults to the node.col parameter.

cluster.colours <- get.colours(
    value.list = CEV.dt$label,
    return.names = TRUE
    );
CEV.dt$colours <- cluster.colours[CEV.dt$label];

plt <- SRCGrob(
    tree = CEV.dt,
    main = paste(patient, 'Tree'),
    horizontal.padding = 3,
    scale1 = 2,
    yaxis1.label = y.lab,
    add.normal = TRUE
    );

Passing the colours using node.col results in a coloured tree, but with the wrong colours (shown below). Colours should match the heatmap.

plt <- SRCGrob(
    tree = CEV.dt,
    node.col = cluster.colours[CEV.dt$label],
    main = paste(patient, 'Tree'),
    horizontal.padding = 3,
    scale1 = 2,
    yaxis1.label = y.lab,
    add.normal = TRUE
    );

2023-05-17_MSK-AB-0007_clustered-hm 2023-05-17_MSK-AB-0007_CEV-tree-colour

whelena commented 1 year ago

See temporary fix in the associated development branch

dan-knight commented 1 year ago

It's unclear from the documentation that the node.col parameter is meant to specify a single colour for all nodes and for each node to have a different colour,

Should we leave node.col as another parameter (maybe renamed to default.node.col?), or should we just move this to the input dataframe for consistency? Users could still easily set the colour for all nodes like $tree$node.col <- 'red'; so I don't think we gain much by including the parameter as well (especially with the lost clarity).

whelena commented 1 year ago

I think the way you did it in the pull request is ok. We'll make sure documentation is clear that everything in the dataframe is prioritized and default.node.col only takes a character input (not list of colours).

dan-knight commented 1 year ago

Sounds good. We can document and also include a warning if a user passes in multiple values. Something like Multiple values passed to "default.node.col". To set multiple node colours, add a "node.col" column to "tree.df". Then, just use the first colour passed in, ignoring the others.

whelena commented 1 year ago

Yes, exactly what I was thinking