add df builder functions

mathidachuk commented 4 years ago

First draft of functions that helps construct hp-compatible dataframes from a flat dataset. The draft is an attempt to provide a step-by-step guide for users to construct a hierplane from a dataset of their own.

notes for discussion:

instead of adding layer by layer, is it possible to wrap all the layers in a dynamic builder?
- if so, what does user input look like?
what kind of examples do users need to see to fully understand the use of the builder functions?
is the function dynamic enough? is it too dynamic?
can we make this pipable if an overall layer is not possible? consult ggplot or echarts?

Demo:

library(dplyr)
library(hierplane)
library(shiny)
library(reactable)

source("r/builder.R")
source("r/utils.R")

df_orig <- read.csv("../app_structure.csv", na.strings = "")

df <- bind_rows(
  # define root
  add_root("Applications"),
  # layer 1 - link to root
  add_layer(df_orig,
            parent_vals = "Applications",
            child_col = "app",
            link_vals = " ", # allow manually defined static values
            node_type_vals = "app", # allow manually defined static values
            attribute_cols = "internal"),
  # layers 2 to n
  # (parent for this layer is same as child in previous layer)
  add_layer(df_orig,
            parent_col = "app",
            child_col = "app_page",
            link_vals = " ",
            node_type_vals = "page"
            ),
  add_layer(df_orig,
            parent_col = "app_page",
            child_col = "page_component",
            link_vals = " ",
            node_type_vals = "comps"),
  add_layer(df_orig,
            parent_col = "page_component",
            child_col = "subunits",
            link_vals = "units", # static value for a layer can be used for node placement
            node_type_col = "type",
            attribute_cols = c("interactive",
                               "type"))
)

ui <- fluidPage(
  hierplaneOutput("hplane", height = "100%"),
  h2("original data"),
  reactableOutput("original_data"),
  h2("transformed data"),
  reactableOutput("hplane_tbl")
)

server <- function(input, output, session) {
  output$hplane <- renderHierplane({
    df %>%
      hp_dataframe(title = "Available Applications",
                   styles = hierplane_styles(
                     link_to_positions = list(units = "right"),
                     link_name_to_label = list(units = " ") # dont actually show the label
                   )) %>%
      hierplane()
  })

  output$original_data <- renderReactable({
    reactable(df_orig)
  })

  output$hplane_tbl <- renderReactable({
    reactable(df)
  })
}

shinyApp(ui, server)

mathidachuk commented 4 years ago

welp cant attach csv so here's the data...

df_orig <- structure(list(app = c("product search", "product search", "product search", 
"product search", "product search", "product search", "product search", 
"product search", "product search", "feature request", "feature request", 
"feature request", "feature request", "feature request", "feature request", 
"feature request", "feature request"), app_page = c("welcome", 
"welcome", "search", "search", "search", "search", "search", 
"about", "about", "request list", "new form", "new form", "new form", 
"new form", "edit form", "edit form", "edit form"), page_component = c("introduction", 
"brand selection", "filters", "filters", "preview", "download", 
"download", "contacts", "useful links", NA, "input form", "input form", 
"advanced controls", "advanced controls", "admin form", "admin form", 
"advanced controls2"), subunits = c(NA, NA, "filter dropdowns", 
"filter summary", "data preview table", "download settings", 
"download button", NA, NA, NA, "category", "description", "upload attachment", 
"delete request", "assign owner", "update status", "set priority"
), interactive = c("static", "interactive", "interactive", "static", 
"static", "interactive", "interactive", "static", "static", "static", 
"interactive", "interactive", "interactive", "interactive", "interactive", 
"interactive", "interactive"), internal = c(TRUE, TRUE, TRUE, 
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE), type = c("required", "required", 
"required", "required", "required", "optional", "required", "required", 
"required", "required", "required", "required", "optional", "optional", 
"required", "required", "optional")), class = "data.frame", row.names = c(NA, 
-17L))

tylerlittlefield commented 4 years ago

I like the example! However, creating an hp_dataframe still doesn't "click" for me. I am starting to think that hp_dataframe shouldn't be exported, IMO there is too much work involved to render a hierplane and I fear it will be frustrating for users. However, I still think we can use it to take data frames from existing packages and create the required hierplane JSON.

I have just discovered data.tree and it looks like a very mature, and useful package for hierplane. It has many utilities for creating hierarchical data and more importantly ToDataFrameNetwork() which we can pass to hp_dataframe(). This way, a user who is comfortable with data.tree can easily generate hierplanes.

Based on the 100,000+ downloads per month alone, data.tree seems to be the tool to use when working with hierarchical data in R so I think we should take advantage. Below is a very rough example of what I am thinking:

library(data.tree)
library(hierplane)
library(yaml)
library(dplyr)

# creating a tree with {data.tree}
acme <- Node$new("Acme Inc.")
  accounting <- acme$AddChild("Accounting")
    software <- accounting$AddChild("New Software")
    standards <- accounting$AddChild("New Accounting Standards")
  research <- acme$AddChild("Research")
    newProductLine <- research$AddChild("New Product Line")
    newLabs <- research$AddChild("New Labs")
  it <- acme$AddChild("IT")
    outsource <- it$AddChild("Outsource")
    agile <- it$AddChild("Go agile")
    goToR <- it$AddChild("Switch to R")

acme

# function for creating a minimal root dataframe compatible with data.tree
get_root <- function(x) {
  x <- names(x$Get("level")[which(x$Get("level") %in% 1)])
  data.frame(
    from = x,
    to = x,
    child = x,
    link = x
  )
}

hp_datatree <- function(x) {
  root <- get_root(x)

  ToDataFrameNetwork(x) %>% 
    mutate(
      child = to,
      link = " ",
      node_type = NA,
      attribute1 = NA,
      attribute2 = NA
    ) %>% 
    bind_rows(root) %>% 
    hp_dataframe(
      settings = hierplane_settings(
        root_tag = root$from,
        parent_id = "from",
        child_id = "to"
      )
    )
}

# create a hierplane
acme %>% 
  hp_datatree() %>% 
  hierplane()

If we heavily rely on data.tree we can take advantage of all it's cool features, for example, allowing users to render hierplanes based on a yaml file:

yaml <- "
name: OS Students 2014/15
OS X:
  Yosemite:
    users: 16
  Leopard:
    users: 43
Linux:
  Debian:
    users: 27
  Ubuntu:
    users: 36
Windows:
  W7:
    users: 31
  W8:
    users: 32
  W10:
    users: 4
"

osList <- yaml.load(yaml)
osNode <- as.Node(osList)

osNode %>% 
  hp_datatree() %>% 
  hierplane()

mathidachuk commented 4 years ago

Thanks for providing the example for data.tree! I think think I remember why I steered away from it now. I was not able to get attributes, link, or node_type to correctly integrate into the data. If you find out a way, let me know lol

I do think the function I wrote offers a lot more versatility tho. If you were trying to construct a hierplane like the one I made, you would have to type/do a lot more with data.tree (even if you can get it to work).

Also I think giving advanced users to be able to flexibly use the data is critical for the success of the package. Maybe we can do what echarts4r does and put really detailed documentation under the advanced portion of the pkgdown????

tylerlittlefield commented 4 years ago

Getting the attributes, link, etc is possible. In data.tree you can pull them into a dataframe by providing the column names:

library(data.tree)

# creating a tree with {data.tree}
acme <- Node$new("Acme Inc.")
accounting <- acme$AddChild("Accounting")
software <- accounting$AddChild("New Software")
standards <- accounting$AddChild("New Accounting Standards")
research <- acme$AddChild("Research")
newProductLine <- research$AddChild("New Product Line")
newLabs <- research$AddChild("New Labs")
it <- acme$AddChild("IT")
outsource <- it$AddChild("Outsource")
agile <- it$AddChild("Go agile")
goToR <- it$AddChild("Switch to R")

acme$Accounting$`New Software`$cost <- 1000000
acme$Accounting$`New Accounting Standards`$cost <- 500000
acme$Research$`New Product Line`$cost <- 2000000
acme$Research$`New Labs`$cost <- 750000
acme$IT$Outsource$cost <- 400000
acme$IT$`Go agile`$cost <- 250000
acme$IT$`Switch to R`$cost <- 50000

acme$Accounting$`New Software`$p <- 0.5
acme$Accounting$`New Accounting Standards`$p <- 0.75
acme$Research$`New Product Line`$p <- 0.25
acme$Research$`New Labs`$p <- 0.9
acme$IT$Outsource$p <- 0.2
acme$IT$`Go agile`$p <- 0.05
acme$IT$`Switch to R`$p <- 1

ToDataFrameNetwork(acme, "cost", "p")

And I get that it's kind of scary the amount of text you need to write but I think this is why data.tree makes sense to me. It's more strict and so constructing a tree is a lot clearer to me. For example, I have been trying to render a hierplane of the following data with add_root/add_layer:

structure(list(from = c("OS Students 2014/15", "OS Students 2014/15", 
"OS Students 2014/15", "OS X", "OS X", "Linux", "Linux", "Windows", 
"Windows", "Windows"), to = c("OS X", "Linux", "Windows", "Yosemite", 
"Leopard", "Debian", "Ubuntu", "W7", "W8", "W10")), row.names = c(NA, 
10L), class = "data.frame")

But I cannot figure out how to get this to work.

mathidachuk commented 4 years ago

The OS data you provided is already in hierarchical structure, so you dont have to do add_root and add_layer.

Now consider this dataset (much more likely data structure):

df_orig <- tribble(
  ~"Survey",              ~"Operating System", ~"OS Version", ~"users",
  "OS Students 2014/15",  "OS X"             , "Yosemite",    16,
  "OS Students 2014/15",  "OS X"             , "Leopard",     43,
  "OS Students 2014/15",  "Linux"            , "Debian",      27,
  "OS Students 2014/15",  "Linux"            , "Ubuntu",      36,
  "OS Students 2014/15",  "Windows"          , "Win7",        31,
  "OS Students 2014/15",  "Windows"          , "Win8",        32,
  "OS Students 2014/15",  "Windows"          , "Win10",       4
)

How would you translate this dataset to the one you provided using data.tree without manually assigning everything?

Here's what it looks like with add_ functions:

df <- bind_rows(
  # define root
  add_root("OS Students 2014/15"),
  # layer 1 - link to root
  add_layer(df_orig,
            parent_vals = "OS Students 2014/15",
            child_col = "Operating System",
            link_vals = "OS", # allow manually defined static values
            node_type_vals = "OS"),
  # layers 2 to n
  # (parent for this layer is same as child in previous layer)
  add_layer(df_orig,
            parent_col = "Operating System",
            child_col = "OS Version",
            link_vals = "Ver",
            node_type_vals = "Sub",
            attribute_cols = "users"
  ))

df %>%
  hp_dataframe(title = "Survey Results of Most Popular OS in 2014/15",
               settings = hierplane_settings(attributes = "attribute1"),
               styles = hierplane_styles(
                 link_to_positions = list(Ver = "right")
               )) %>%
  hierplane()

tylerlittlefield commented 4 years ago

I see, okay this make sense. We will add data.tree support for rendering hierplane of json/yaml/csv etc that is already hierarchical. Then keep hp_dataframe for parsing data frames which aren't hierarchical.

tylerlittlefield commented 4 years ago

However, I agree with the point you made about piping, I think we can get away with a syntax like:

add_root("a") %>% 
  add_layer("b") %>% 
  add_layer("c")

tylerlittlefield commented 4 years ago

The add_layer could basically have .x be the data that is passed and .y for the original dataframe?

mathidachuk commented 4 years ago

Can you pass two things in a single pipe?? I need to take a look at echarts and see how John handled it...

tylerlittlefield commented 4 years ago

Maybe we can require the original data in add_root, then bind rows in add_layer and kind of carry the data along by storing it as an attribute?

mathidachuk commented 4 years ago

Ohhh can you use a dataset as an attribute?

Looks like ggplot sets up an environment to handle layering and echarts creates a widget (the data is part of it!) and just handles it like a list.

maybe we can do

data %>% add_root("OS 2014/15") %>% add_layer() %>% ...

add_root outputs a list of data and df, where data is the original dataset, and df is the root dataframe. Subsequently, add_layer takes the output from root (list(data, df)) and operate on the df element (i.e. add to it) and return list(data, df).

what do you think??

p.s. if attribute can take a dataframe input, i prefer the add_ functions to return a dataframe with the original data as attribute.

tylerlittlefield commented 4 years ago

I also prefer the output to be a single data frame. The new environment thing sounds fancy/cool. I like that idea. Attributes can contain data but idk if it's best practice hehe.

mathidachuk commented 4 years ago

I will work on making the add_ functions pipe compatible. Will explore env (SCARY) and attribute options.

mathidachuk commented 4 years ago

Will you consider the echarts implementation? He creates a widget and then just keeps adding to the x.

In other words...

data %>% add_root

returns a hierplane object and is ready for plotting. When you add a layer, it just updates the data to be passed to the widget. The challenge here is that the settings need to be passed with the widget. Maybe that can just be an optional param in add root and add layer.

After looking into it a bit more, the hierplane widget option is just a list.

This makes things really easy. Can also add pipe operation for adding styles now!

tylerlittlefield commented 4 years ago

Gotcha, but if you are just modifying the widget how does this work? Wouldn't you need to have added the root and layers prior to creating the widget? Btw, here is what I had in mind with the attribute method:

add_root2 <- function(.data, ...) {
  root <- add_root(...)
  attr(root, "source") <- .data
  root
}

add_layer2 <- function(.data, ...) {
  source <- attr(.data, "source")
  layer <- add_layer(source, ...)
  attr(layer, "source") <- source
  dplyr::bind_rows(.data, layer)
}

df_orig %>% 
  add_root2("OS Students 2014/15") %>% 
  add_layer2(
    parent_vals = "OS Students 2014/15",
    child_col = "Operating System",
    link_vals = "OS", 
    node_type_vals = "OS"
  ) %>% 
  add_layer2(
    parent_col = "Operating System",
    child_col = "OS Version",
    link_vals = "Ver",
    node_type_vals = "Sub",
    attribute_cols = "users"
  ) %>% 
  hp_dataframe(
    title = "Survey Results of Most Popular OS in 2014/15",
    settings = hierplane_settings(attributes = "attribute1"),
    styles = hierplane_styles(
      link_to_positions = list(Ver = "right")
    )
  ) %>%
  hierplane()

mathidachuk commented 4 years ago

You would just modify the x portion of the widget list.

I actually really like your implementation. Nice and simple!!

tylerlittlefield commented 4 years ago

Okay, I can come up with something really quick, I'll branch off of this branch.

mathidachuk commented 4 years ago

Ok thanks!!!! I'll clean up the documentation. I was thinking about adding the OS dataset and maybe the org dataset to the package. Thoughts?

tylerlittlefield commented 4 years ago

I think it's a good idea! More datasets the better 😸

mathidachuk commented 4 years ago

Ok awesome!

Also I think we need to pare down the starships dataset to just the original, and use it to demo add_root and add_layers maybe.

What do you think? Should it stay a ready-to-use df? Or maybe we add the original df and also give users access to the hierplane-ready version? (starships = original, starships_hp = hierplane-ready)

tylerlittlefield commented 4 years ago

Maybe we add a completely different dataset, like the OS one to demonstrate add_ functions? I like how starships can just quickly be used without any work.

tylerlittlefield commented 4 years ago

Assuming the checks pass, we just need to document/export these functions and we are good to go!

mathidachuk commented 4 years ago

Working on it! I will also work on rewriting the starships dataset with our spanking new workflow 💯

tylerlittlefield commented 4 years ago

Awesome! Thank you 😁

mathidachuk commented 4 years ago

OH MAN ended up doing a lot of updates. Added more flexibility and automation in the add_ functions by leveraging inherited attributes/sources. Now the flow is much more clear with no need for indicating a parent.

os_survey %>% 
  add_root("OS Students 2014/15") %>% 
  add_layer(
    child_col = "Operating System",
    link_vals = "OS", 
    node_type_vals = "OS"
  ) %>% 
  add_layer(
    child_col = "OS Version",
    link_vals = "Ver",
    node_type_vals = "Sub",
    attribute_cols = "users"
  ) %>% 
  hp_dataframe(
    title = "Survey Results of Most Popular OS in 2014/15",
    styles = hierplane_styles(
      link_to_positions = list(Ver = "right")
    )
  ) %>%
  hierplane()

Also, link and node_type are now optional values (it's not pretty but if you need something quick and dirty, it works; also it would be great if you want to use this in some sort of interactive application cuz its like generating a pivot table):

os_survey %>% 
  add_root("OS Students 2014/15") %>% 
  add_layer(
    child_col = "Operating System"
  ) %>% 
  add_layer(
    child_col = "OS Version"
  ) %>% 
  hp_dataframe(
    title = "Survey Results of Most Popular OS in 2014/15"
  ) %>%
  hierplane()

r4fun / hierplane

add df builder functions #39