Closed mathidachuk closed 4 years ago
welp cant attach csv so here's the data...
df_orig <- structure(list(app = c("product search", "product search", "product search",
"product search", "product search", "product search", "product search",
"product search", "product search", "feature request", "feature request",
"feature request", "feature request", "feature request", "feature request",
"feature request", "feature request"), app_page = c("welcome",
"welcome", "search", "search", "search", "search", "search",
"about", "about", "request list", "new form", "new form", "new form",
"new form", "edit form", "edit form", "edit form"), page_component = c("introduction",
"brand selection", "filters", "filters", "preview", "download",
"download", "contacts", "useful links", NA, "input form", "input form",
"advanced controls", "advanced controls", "admin form", "admin form",
"advanced controls2"), subunits = c(NA, NA, "filter dropdowns",
"filter summary", "data preview table", "download settings",
"download button", NA, NA, NA, "category", "description", "upload attachment",
"delete request", "assign owner", "update status", "set priority"
), interactive = c("static", "interactive", "interactive", "static",
"static", "interactive", "interactive", "static", "static", "static",
"interactive", "interactive", "interactive", "interactive", "interactive",
"interactive", "interactive"), internal = c(TRUE, TRUE, TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE), type = c("required", "required",
"required", "required", "required", "optional", "required", "required",
"required", "required", "required", "required", "optional", "optional",
"required", "required", "optional")), class = "data.frame", row.names = c(NA,
-17L))
I like the example! However, creating an hp_dataframe
still doesn't "click" for me. I am starting to think that hp_dataframe
shouldn't be exported, IMO there is too much work involved to render a hierplane and I fear it will be frustrating for users. However, I still think we can use it to take data frames from existing packages and create the required hierplane JSON.
I have just discovered data.tree
and it looks like a very mature, and useful package for hierplane. It has many utilities for creating hierarchical data and more importantly ToDataFrameNetwork()
which we can pass to hp_dataframe()
. This way, a user who is comfortable with data.tree
can easily generate hierplanes.
Based on the 100,000+ downloads per month alone, data.tree
seems to be the tool to use when working with hierarchical data in R so I think we should take advantage. Below is a very rough example of what I am thinking:
library(data.tree)
library(hierplane)
library(yaml)
library(dplyr)
# creating a tree with {data.tree}
acme <- Node$new("Acme Inc.")
accounting <- acme$AddChild("Accounting")
software <- accounting$AddChild("New Software")
standards <- accounting$AddChild("New Accounting Standards")
research <- acme$AddChild("Research")
newProductLine <- research$AddChild("New Product Line")
newLabs <- research$AddChild("New Labs")
it <- acme$AddChild("IT")
outsource <- it$AddChild("Outsource")
agile <- it$AddChild("Go agile")
goToR <- it$AddChild("Switch to R")
acme
# function for creating a minimal root dataframe compatible with data.tree
get_root <- function(x) {
x <- names(x$Get("level")[which(x$Get("level") %in% 1)])
data.frame(
from = x,
to = x,
child = x,
link = x
)
}
hp_datatree <- function(x) {
root <- get_root(x)
ToDataFrameNetwork(x) %>%
mutate(
child = to,
link = " ",
node_type = NA,
attribute1 = NA,
attribute2 = NA
) %>%
bind_rows(root) %>%
hp_dataframe(
settings = hierplane_settings(
root_tag = root$from,
parent_id = "from",
child_id = "to"
)
)
}
# create a hierplane
acme %>%
hp_datatree() %>%
hierplane()
If we heavily rely on data.tree
we can take advantage of all it's cool features, for example, allowing users to render hierplanes based on a yaml file:
yaml <- "
name: OS Students 2014/15
OS X:
Yosemite:
users: 16
Leopard:
users: 43
Linux:
Debian:
users: 27
Ubuntu:
users: 36
Windows:
W7:
users: 31
W8:
users: 32
W10:
users: 4
"
osList <- yaml.load(yaml)
osNode <- as.Node(osList)
osNode %>%
hp_datatree() %>%
hierplane()
Thanks for providing the example for data.tree! I think think I remember why I steered away from it now. I was not able to get attributes, link, or node_type to correctly integrate into the data. If you find out a way, let me know lol
I do think the function I wrote offers a lot more versatility tho. If you were trying to construct a hierplane like the one I made, you would have to type/do a lot more with data.tree (even if you can get it to work).
Also I think giving advanced users to be able to flexibly use the data is critical for the success of the package. Maybe we can do what echarts4r does and put really detailed documentation under the advanced
portion of the pkgdown????
Getting the attributes, link, etc is possible. In data.tree
you can pull them into a dataframe by providing the column names:
library(data.tree)
# creating a tree with {data.tree}
acme <- Node$new("Acme Inc.")
accounting <- acme$AddChild("Accounting")
software <- accounting$AddChild("New Software")
standards <- accounting$AddChild("New Accounting Standards")
research <- acme$AddChild("Research")
newProductLine <- research$AddChild("New Product Line")
newLabs <- research$AddChild("New Labs")
it <- acme$AddChild("IT")
outsource <- it$AddChild("Outsource")
agile <- it$AddChild("Go agile")
goToR <- it$AddChild("Switch to R")
acme$Accounting$`New Software`$cost <- 1000000
acme$Accounting$`New Accounting Standards`$cost <- 500000
acme$Research$`New Product Line`$cost <- 2000000
acme$Research$`New Labs`$cost <- 750000
acme$IT$Outsource$cost <- 400000
acme$IT$`Go agile`$cost <- 250000
acme$IT$`Switch to R`$cost <- 50000
acme$Accounting$`New Software`$p <- 0.5
acme$Accounting$`New Accounting Standards`$p <- 0.75
acme$Research$`New Product Line`$p <- 0.25
acme$Research$`New Labs`$p <- 0.9
acme$IT$Outsource$p <- 0.2
acme$IT$`Go agile`$p <- 0.05
acme$IT$`Switch to R`$p <- 1
ToDataFrameNetwork(acme, "cost", "p")
And I get that it's kind of scary the amount of text you need to write but I think this is why data.tree
makes sense to me. It's more strict and so constructing a tree is a lot clearer to me. For example, I have been trying to render a hierplane of the following data with add_root/add_layer:
structure(list(from = c("OS Students 2014/15", "OS Students 2014/15",
"OS Students 2014/15", "OS X", "OS X", "Linux", "Linux", "Windows",
"Windows", "Windows"), to = c("OS X", "Linux", "Windows", "Yosemite",
"Leopard", "Debian", "Ubuntu", "W7", "W8", "W10")), row.names = c(NA,
10L), class = "data.frame")
But I cannot figure out how to get this to work.
The OS data you provided is already in hierarchical structure, so you dont have to do add_root and add_layer.
Now consider this dataset (much more likely data structure):
df_orig <- tribble(
~"Survey", ~"Operating System", ~"OS Version", ~"users",
"OS Students 2014/15", "OS X" , "Yosemite", 16,
"OS Students 2014/15", "OS X" , "Leopard", 43,
"OS Students 2014/15", "Linux" , "Debian", 27,
"OS Students 2014/15", "Linux" , "Ubuntu", 36,
"OS Students 2014/15", "Windows" , "Win7", 31,
"OS Students 2014/15", "Windows" , "Win8", 32,
"OS Students 2014/15", "Windows" , "Win10", 4
)
How would you translate this dataset to the one you provided using data.tree
without manually assigning everything?
Here's what it looks like with add_
functions:
df <- bind_rows(
# define root
add_root("OS Students 2014/15"),
# layer 1 - link to root
add_layer(df_orig,
parent_vals = "OS Students 2014/15",
child_col = "Operating System",
link_vals = "OS", # allow manually defined static values
node_type_vals = "OS"),
# layers 2 to n
# (parent for this layer is same as child in previous layer)
add_layer(df_orig,
parent_col = "Operating System",
child_col = "OS Version",
link_vals = "Ver",
node_type_vals = "Sub",
attribute_cols = "users"
))
df %>%
hp_dataframe(title = "Survey Results of Most Popular OS in 2014/15",
settings = hierplane_settings(attributes = "attribute1"),
styles = hierplane_styles(
link_to_positions = list(Ver = "right")
)) %>%
hierplane()
I see, okay this make sense. We will add data.tree
support for rendering hierplane of json/yaml/csv etc that is already hierarchical. Then keep hp_dataframe
for parsing data frames which aren't hierarchical.
However, I agree with the point you made about piping, I think we can get away with a syntax like:
add_root("a") %>%
add_layer("b") %>%
add_layer("c")
The add_layer could basically have .x be the data that is passed and .y for the original dataframe?
Can you pass two things in a single pipe?? I need to take a look at echarts and see how John handled it...
Maybe we can require the original data in add_root, then bind rows in add_layer and kind of carry the data along by storing it as an attribute?
Ohhh can you use a dataset as an attribute?
Looks like ggplot sets up an environment to handle layering and echarts creates a widget (the data is part of it!) and just handles it like a list.
maybe we can do
data %>% add_root("OS 2014/15") %>% add_layer() %>% ...
add_root
outputs a list of data
and df
, where data is the original dataset, and df is the root dataframe. Subsequently, add_layer
takes the output from root
(list(data, df)
) and operate on the df
element (i.e. add to it) and return list(data, df)
.
what do you think??
p.s. if attribute can take a dataframe input, i prefer the add_
functions to return a dataframe with the original data as attribute.
I also prefer the output to be a single data frame. The new environment thing sounds fancy/cool. I like that idea. Attributes can contain data but idk if it's best practice hehe.
I will work on making the add_
functions pipe compatible. Will explore env (SCARY) and attribute options.
Will you consider the echarts implementation? He creates a widget and then just keeps adding to the x.
In other words...
data %>% add_root
returns a hierplane object and is ready for plotting. When you add a layer, it just updates the data to be passed to the widget. The challenge here is that the settings need to be passed with the widget. Maybe that can just be an optional param in add root and add layer.
After looking into it a bit more, the hierplane widget option is just a list.
This makes things really easy. Can also add pipe operation for adding styles now!
Gotcha, but if you are just modifying the widget how does this work? Wouldn't you need to have added the root and layers prior to creating the widget? Btw, here is what I had in mind with the attribute method:
add_root2 <- function(.data, ...) {
root <- add_root(...)
attr(root, "source") <- .data
root
}
add_layer2 <- function(.data, ...) {
source <- attr(.data, "source")
layer <- add_layer(source, ...)
attr(layer, "source") <- source
dplyr::bind_rows(.data, layer)
}
df_orig %>%
add_root2("OS Students 2014/15") %>%
add_layer2(
parent_vals = "OS Students 2014/15",
child_col = "Operating System",
link_vals = "OS",
node_type_vals = "OS"
) %>%
add_layer2(
parent_col = "Operating System",
child_col = "OS Version",
link_vals = "Ver",
node_type_vals = "Sub",
attribute_cols = "users"
) %>%
hp_dataframe(
title = "Survey Results of Most Popular OS in 2014/15",
settings = hierplane_settings(attributes = "attribute1"),
styles = hierplane_styles(
link_to_positions = list(Ver = "right")
)
) %>%
hierplane()
You would just modify the x
portion of the widget list.
I actually really like your implementation. Nice and simple!!
Okay, I can come up with something really quick, I'll branch off of this branch.
Ok thanks!!!! I'll clean up the documentation. I was thinking about adding the OS dataset and maybe the org dataset to the package. Thoughts?
I think it's a good idea! More datasets the better 😸
Ok awesome!
Also I think we need to pare down the starships dataset to just the original, and use it to demo add_root and add_layers maybe.
What do you think? Should it stay a ready-to-use df? Or maybe we add the original df and also give users access to the hierplane-ready version? (starships = original, starships_hp = hierplane-ready)
Maybe we add a completely different dataset, like the OS one to demonstrate add_ functions? I like how starships can just quickly be used without any work.
Assuming the checks pass, we just need to document/export these functions and we are good to go!
Working on it! I will also work on rewriting the starships dataset with our spanking new workflow 💯
Awesome! Thank you 😁
OH MAN ended up doing a lot of updates. Added more flexibility and automation in the add_
functions by leveraging inherited attributes/sources. Now the flow is much more clear with no need for indicating a parent.
os_survey %>%
add_root("OS Students 2014/15") %>%
add_layer(
child_col = "Operating System",
link_vals = "OS",
node_type_vals = "OS"
) %>%
add_layer(
child_col = "OS Version",
link_vals = "Ver",
node_type_vals = "Sub",
attribute_cols = "users"
) %>%
hp_dataframe(
title = "Survey Results of Most Popular OS in 2014/15",
styles = hierplane_styles(
link_to_positions = list(Ver = "right")
)
) %>%
hierplane()
Also, link
and node_type
are now optional values (it's not pretty but if you need something quick and dirty, it works; also it would be great if you want to use this in some sort of interactive application cuz its like generating a pivot table):
os_survey %>%
add_root("OS Students 2014/15") %>%
add_layer(
child_col = "Operating System"
) %>%
add_layer(
child_col = "OS Version"
) %>%
hp_dataframe(
title = "Survey Results of Most Popular OS in 2014/15"
) %>%
hierplane()
First draft of functions that helps construct hp-compatible dataframes from a flat dataset. The draft is an attempt to provide a step-by-step guide for users to construct a hierplane from a dataset of their own.
notes for discussion:
Demo: