r4fun / hierplane

🌳 Hierplane for R
https://r4fun.github.io/hierplane/
Other
9 stars 0 forks source link

Add function to go from json --> hp compatible df? #37

Open mathidachuk opened 4 years ago

mathidachuk commented 4 years ago

So based on the org data I built the following function that converts the json data into a hierplane-compatible dataset. I think it maybe useful if say...someone wants to modify json tree data as a dataframe then convert it back to json + hierplane.

Let me know what you think about including it in the package.

p.s. tree.json = the org data

library(jsonlite)
library(dplyr)
library(hierplane)

source_json <- "tree.json" %>%
  read_json()

parse_json_node <- function(x, head_word) {

  children <- data.frame()

  n_children <- length(x$children)

  if (n_children > 0) {
    for (child in 1:n_children) {
      children <- bind_rows(children,
                            parse_json_node(x$children[[child]],
                                       head_word = x$word))
    }
  }

  out <- data.frame(
    parent_id = head_word,
    child_id = x$word,
    child = x$word,
    link = x$link,
    node_type = x$nodeType
  )

  if (!is.null(x$attributes)) {
    out$attribute1 <- x$attributes
  }

  out <- bind_rows(children, out)

  out

}

parse_json_tree <- function(x) {

  root_word <- x$root$word

  root_df <- data.frame(
    parent_id = root_word,
    child_id = root_word,
    child = root_word,
    node_type = "ROOT",
    link = "ROOT",
    attribute2 = NA # maybe attributes should be defaulted to two columns....
  )

  if (!is.null(x$root$attributes)) {
    root_df$attribute1 <- x$root$attributes
  }

  children_df <- lapply(x$root$children, parse_json_node, head_word = root_word) %>%
    bind_rows()

  styles <- list(
    node_type_to_style = x$nodeTypeToStyle,
    link_to_positions = x$linkToPosition,
    link_name_to_label = x$linkNameToLabel
  )

  names(styles$node_type_to_style)[
    grep("root", names(styles$node_type_to_style), ignore.case = T)
    ] <- "ROOT"

  list(df = bind_rows(root_df, children_df),
       styles = styles,
       title = x$text)

}

parsed_data <- parse_json_tree(source_json)
hierplane(hp_dataframe(.data = parsed_data$df,
                       title = parsed_data$title,
                       styles = parsed_data$styles))
tylerlittlefield commented 4 years ago

I do think a function for translating multiple types of files to hierplane ready data is important. Please take a look at data.tree and let me know what you think. It's a very popular package for this type of data and we can take advantage. For example, being able to parse multiple file formats:

csv -> data.frame in table format (?read.csv) -> data.tree (?as.Node.data.frame) Newick -> ape phylo (?ape::read.tree) -> data.tree (?as.Node.phylo ) csv -> data.frame in network format (?read.csv) -> data.tree (c.f. ?FromDataFrameNetwork) yaml -> list of lists (?yaml::yaml.load) -> data.tree (?as.Node.list) json -> list of lists (e.g. ?jsonlite::fromJSON) -> data.tree (?as.Node.list)

If we have an hp_datatree function, we can automatically take advantage of csv/newick/yaml/json all at once!

tylerlittlefield commented 4 years ago

Regarding this comment:

attribute2 = NA # maybe attributes should be defaulted to two columns....

I was also thinking about this... attributes feels like it should just be a single parameter and the user passes n column names to it.

mathidachuk commented 4 years ago

I played with data.tree a little bit when I was building the build_tree and build_node functions, but it was not doing what I needed it to do at all unfortunately. Maybe you will have better luck getting it to work for our purposes.

I thought as.Node.data.frame would work for converting spacyr to a list that we need for example, but it didn't work for me. I was probably doing something wrong.

tylerlittlefield commented 4 years ago

Well we will still need build_tree/build_node since hp_dataframe calls those, and that's fine IMO. And spacyr just works, so I don't think we need to touch that. I am saying that we might need to consider removing hp_dataframe as an export and make it an internal function. The data.tree package can just be something we make sure hierplane is compatible with, namely that a user can pass a data.tree object to hierplane and it's able to render.

mathidachuk commented 4 years ago

Ok gotcha. I think it's a great idea to make sure we have a data.tree compatible function. I have some reservations about removing hp_dataframe tho. See #39

mathidachuk commented 4 years ago

Also reminder that hp_spacyr also relies on the build_ functions. The build_ functions allows use to go from df --> hierarchical list structure.

tylerlittlefield commented 4 years ago

I 100% agree which is why I don't think build functions should be touched, they just work. It's hp_dataframe that I am on the fence about because of the work required.

mathidachuk commented 4 years ago

Maybe we show users how to construct a dataframe from data.tree and how to add link and node and attribute columns?? That can be an option. And then they can just use hp_dataframe????