liamrevell / phytools

GNU General Public License v3.0
198 stars 56 forks source link

missing value color argument? #155

Open HedvigS opened 2 months ago

HedvigS commented 2 months ago

I think that the default for missing data in phylo.heatmap is good, white with black line crossing. However, I've found that if you're showing a tree with a lot of tips the black of the line can end up looking like it's a color value mapped to non-missing data.

Would it be possible to introduce an argument to phylo.heatmap (and perhaps also dotTree and plotFanTree.wTraits where users can opt for another missing value color?

Here are some plots illustrating the situation.

I realise that plotting trees with 2,000 tips is always gonna be tricky, I'm considering other options as well - or at least making very very tall plots as we're working on the project.

Tree with a few tips tree_world_GB

Tree with 2,000 tips (black looks like it might be a third value) tree_world_GB

I've tried a very hacky solution just for now. I happen to have binary data, 0's and 1's. So, I introduced a third number, replaced all NAs with this number and assigned it the color white. I also swapped out the light yellow for green to make the difference between that value and missing be more clear.

tree_world_GB

Code (I can rewrite with anolis or something else if you want)


source("01_requirements.R")

TenseTable <- read_tsv("output/processed_data/TenseTable.tsv", show_col_types = F)
DefArtTable <- read_tsv("data/grambank_v1.0.3/ValueTable.tsv", show_col_types = F) %>% 
  filter(Parameter_ID == "GB020") %>% 
  dplyr::select(Language_ID, DefArt = Value)

EDGE_tree <- ape::read.tree("output/processed_data/EDGE_pruned_tree.tree")

df <- EDGE_tree$tip.label %>% 
  as.data.frame() %>% 
  rename(Language_ID = ".") %>% 
  left_join(TenseTable, by = "Language_ID") %>% 
  left_join(DefArtTable, by = "Language_ID") %>% 
  column_to_rownames("Language_ID") %>% 
  dplyr::select("Tense", "DefArt") %>% 
  as.matrix()

#phylo.heatmap only plots continous data. When there is a missing data, this is displayed with a 
df[is.na(df)] <- 0.5

colors <- c( "#95D840FF",
             "white",
             "#440154FF")

png(file = paste0("output/plots/tree_world_GB.png"), width = 8.27, height = 10.69, units = "in", res = 600)

phytools::phylo.heatmap(tree = EDGE_tree,
                        X = df, 
                        legend = T,
                        ftype = "off", 
                        split=c(0.9,0.2),
                         colors = colors, 
                        legend = FALSE)

x <- dev.off()