r4fun / hierplane

🌳 Hierplane for R
https://r4fun.github.io/hierplane/
Other
9 stars 0 forks source link

Add `data.tree` methods #42

Closed mathidachuk closed 4 years ago

mathidachuk commented 4 years ago

Here's a start:


os_survey %>%
  mutate(pathString = paste(
    "OS Students 2014-15", `Operating System`, `OS Version`, sep = "/"
    )) %>%
  as.Node() %>% 
  ToDataFrameNetwork("users") %>% 
  bind_rows(data.frame(
    from = "OS Students 2014-15",
    to = "OS Students 2014-15",
    link = "ROOT",
    node_type = "ROOT"
  )) %>% 
  hp_dataframe(settings = hierplane_settings(
    parent_id = "from",
    child_id = "to",
    child = "to",
    attributes = "users"
  )) %>% 
  hierplane()

The challenge remains that there is no way to set the node types and links per layer.

tylerlittlefield commented 4 years ago

Thanks I’ll start working on this. I have some ideas for the attributes and stuff. They’ll be part of hp_datatree()

tylerlittlefield commented 4 years ago

Okay here is a working example of using hp_datatree() on a YAML file:

library(hierplane) # devtools::install_github("r4fun/hierplane", "datatree-compatible")
library(data.tree)
library(yaml)

"
name: r4fun
tyler:
  name: Tyler
  job: Data Scientist
  species: Human
  toulouse:
    name: Toulouse
    job: Systems Engineer
    species: Cat
    jojo:
      name: Jojo
      job: Python Programmer
      species: Dog
  ollie:
    name: Ollie
    job: Database Administrator
    species: Dog
  lucas:
    name: Lucas
    job: R Programmer
    species: Rabbit
" -> yaml

yaml %>% 
  yaml.load() %>% 
  as.Node() %>% 
  hp_datatree(
    title = "r4fun github group",
    link = "species",
    attributes = "job"
  ) %>% 
  hierplane(
    theme = "light", 
    width = "auto",
    height = "auto"
  )

Note, this requires the latest per defd52a. There was a bug where the root data.frame that's appended to everything was appending an integer. This happened because when creating the data.frame, I didn't call stringsAsFactors = FALSE. The result was a link that didn't match/make sense. This also explains why it worked on my personal computer running R 4.0.2 but not my work computer running R 3.6.2!

tylerlittlefield commented 4 years ago

Here is another example using the traditional data.tree method for creating trees programmatically, as described here. These example also try to highlight some of the design decisions:

library(hierplane)
library(data.tree)

# acme
acme <- Node$new("Acme Inc.")
accounting <- acme$AddChild("Accounting")
software <- accounting$AddChild("New Software")
standards <- accounting$AddChild("New Accounting Standards")
research <- acme$AddChild("Research")
newProductLine <- research$AddChild("New Product Line")
newLabs <- research$AddChild("New Labs")
it <- acme$AddChild("IT")
outsource <- it$AddChild("Outsource")
agile <- it$AddChild("Go agile")
goToR <- it$AddChild("Switch to R")
acme$Accounting$`New Software`$cost <- 1000000
acme$Accounting$`New Accounting Standards`$cost <- 500000
acme$Research$`New Product Line`$cost <- 2000000
acme$Research$`New Labs`$cost <- 750000
acme$IT$Outsource$cost <- 400000
acme$IT$`Go agile`$cost <- 250000
acme$IT$`Switch to R`$cost <- 50000
acme$Accounting$`New Software`$p <- 0.5
acme$Accounting$`New Accounting Standards`$p <- 0.75
acme$Research$`New Product Line`$p <- 0.25
acme$Research$`New Labs`$p <- 0.9
acme$IT$Outsource$p <- 0.2
acme$IT$`Go agile`$p <- 0.05
acme$IT$`Switch to R`$p <- 1
acme$IT$Outsource$AddChild("India")
acme$IT$Outsource$AddChild("Poland")
acme$Set(type = c('company', 'department', 'project', 'project', 'department', 'project', 'project', 'department', 'program', 'project', 'project', 'project', 'project'))

# a simple hierplane of the acme tree
# * hp_datatree has explicit params for the settings because its more strict
# * e.g. we make a hard stance that the parent/child cols should always be from/to
# * In other words, I haven't found a use case for actually modifying any of the
# other columns (except maybe node_type?)
acme %>%
  hp_datatree(
    title = "Acme Inc.",
    link = "type",
    attributes = c("cost", "p")
  ) %>%
  hierplane(
    theme = "light",
    width = "auto",
    height = "auto"
  )

# if we assign the link to the parent_id, we warn the user and set it
# back to the child_id
acme %>%
  hp_datatree(
    title = "Acme Inc.",
    link = "from",
    attributes = c("cost", "p")
  ) %>%
  hierplane(
    theme = "light",
    width = "auto",
    height = "auto"
  )

# if the link is missing values (excluding the root row), we warn the user
# and set the link back to the child_id
acme %>%
  hp_datatree(
    title = "Acme Inc.",
    link = "cost",
    attributes = c("cost", "p")
  ) %>%
  hierplane(
    theme = "light",
    width = "auto",
    height = "auto"
  )
mathidachuk commented 4 years ago

I think you should allow user the freedom to control node type, in case they want to style their nodes.

Also what if users don't want a link label at all?

Also lol@stringsasfactors causing trouble.

mathidachuk commented 4 years ago

ps who's jojo???

tylerlittlefield commented 4 years ago

I agree, I'll explore more on the node type stuff. I am still waiting for the weekend to dive into hp_datatree some more. And regarding the link, I have been thinking of making the default link " ", so it's just empty.

Corgi JoJo: https://www.supercorgijojo.com/about

😅

mathidachuk commented 4 years ago

I have an idea about resolving multiple children in yaml (or any other tree) inputs. I'll submit a PR to your branch if I can get it to work.

tylerlittlefield commented 4 years ago

Sounds good to me. I think you mentioned adding an integer to duplicates. So child1, child2, etc. I think that would be a good idea.

mathidachuk commented 4 years ago

Decided to take a slightly different approach. We should not modify the from and to columns. Instead we just add a separate column to use as child. childs do not have to be unique and we do not care about the paths when plotting.

image This hp is generated using the code in PR #44. Notice how the repeated name "New Labs" does not impact tree generation. Furthermore, there is handling for if there are / in the node names so we don't accidentally remove part of the node name. Let me know what you think.

tylerlittlefield commented 4 years ago

Awesome, this works really well! As another example, we can have 2 Toulouse's now 😼

devtools::load_all() # load hierplane from this particular branch
library(data.tree)
library(yaml)

"
name: r4fun
tyler:
  name: Tyler
  job: Data Scientist
  species: Human
  toulouse:
    name: Toulouse
    job: Systems Engineer
    species: Cat
    toulouse:
      name: Toulouse
      job: Systems Engineer
      species: Cat
  ollie:
    name: Ollie
    job: Database Administrator
    species: Dog
  lucas:
    name: Lucas
    job: R Programmer
    species: Rabbit
" -> yaml

yaml %>% 
  yaml.load() %>% 
  as.Node() %>% 
  hp_datatree(
    title = "r4fun github group",
    link = "species",
    attributes = "job"
  ) %>% 
  hierplane(
    theme = "light", 
    width = "auto",
    height = "auto"
  )