Plotting a psem with categorical variables

calummaney commented 3 months ago

Hello,

Many thanks for your continued work on this project - it's amazingly useful for the project I'm working on.

I'm hoping you can help me out with an issue I'm seeing when trying to plot a fitted psem model. I include an exogenous categorical variable but plotting the resulting model doesn't seem to work. The following simple reprex throws an error:

df <- data.frame(y = rnorm(n = 100, 10,2), x = rnorm(n=100,5,1),cat = factor(c(1,2)))
psem(
  lm(y ~ x + cat, data = df),
  data=df) |> plot()

"Error in round(ctab$Std.Estimate, digits) : non-numeric argument to mathematical function"

Please let me know if I am making a mistake with this sort of thing, or what may be causing the errors.

Thanks very much if you are able to help!

Cal

jebyrnes commented 3 months ago

Huh. This looks like it's due to plot.psem() getting the info for the nodes from coefs(). But, coefs() here yields

  Response Predictor Estimate Std.Error DF Crit.Value P.Value Std.Estimate    
1        y         x   0.0304    0.1786 97     0.1702  0.8652            -    
2        y       cat        -         -  1     1.4748  0.2275            -    
3        y   cat = 2    9.714    0.2681 97    36.2392  0.0000            - ***
4        y   cat = 1  10.1758    0.2681 97    37.9618  0.0000            - ***

I also noticed that even after filtering out that row, a bunch of other issues come up. This was super useful to mention, and now I have implemented several fixes in the function. One of which is the default for plot.psem() is standardized coefs (show = "std"), but, you don't have them here! So I made a new warning message.

OK - let's fix it! @jslefche can you update this new plot.psem() code in whatever branch you have going?

plot.psem <- function (x, return = FALSE, 
                       node_attrs = data.frame(shape = "rectangle",
                                               color = "black", 
                                               fillcolor = "white"), 
                       edge_attrs = data.frame(style = "solid", 
                                               color = "black"), 
                       ns_dashed = T, alpha = 0.05, show = "std", 
                       digits = 3, add_edge_label_spaces = TRUE, ...) 
{

  ctab <- coefs(x)
  # remove categorical predictors with ANOVA results with no estimate
  ctab <- ctab[-which(ctab$Estimate=="-"),]

  # std estimate error
  if(show == "std" & "-" %in% ctab$Std.Estimate) stop("Std. Estimates requested, but some paths cannot have Std. Estimates calculated.\nSee coefs() of your model and consider setting show to 'unstd'.\n")

  # in case categorical variabes were there but are now gone
  ctab$Estimate <- as.numeric(ctab$Estimate)

  ctab$Response <- as.character(ctab$Response)
  ctab$Predictor <- as.character(ctab$Predictor)

  unique_nodes <- unique(c(ctab$Response, ctab$Predictor))
  nodes <- DiagrammeR::create_node_df(n = length(unique_nodes), nodes = unique_nodes, 
                          type = "lower", label = unique_nodes)
  nodes <- cbind(nodes, node_attrs)
  nodes[] <- lapply(nodes, as.character)
  nodes$id <- as.numeric(nodes$id)
  edges <- DiagrammeR::create_edge_df(from = match(ctab$Predictor, unique_nodes), 
                          to = match(ctab$Response, unique_nodes))
  edges <- data.frame(edges, edge_attrs)
  edges[] <- lapply(edges, as.character)
  edges$id <- as.numeric(edges$id)
  edges$from <- as.numeric(edges$from)
  edges$to <- as.numeric(edges$to)
  if (ns_dashed) 
    edges$style[which(ctab$P.Value > alpha)] <- "dashed"
  if (show == "std") 
    edges$label = round(ctab$Std.Estimate, digits)
  if (show == "unstd") 
    edges$label = round(ctab$Estimate, digits)
  if (add_edge_label_spaces) 
    edges$label = paste0(" ", edges$label, " ")
  sem_graph <- DiagrammeR::create_graph(nodes, edges, directed = TRUE)
  if (return) 
    return(sem_graph)
  DiagrammeR::render_graph(sem_graph, ...)
}

Now, if you ran you code above, you get this error:

Error in plot.psem(psem(lm(y ~ x + cat, data = df), data = df)) : 
  Std. Estimates requested, but some paths cannot have Std. Estimates calculated.
 See coefs() of your model and consider setting show to 'unstd'

But if we run

df <- data.frame(y = rnorm(n = 100, 10,2), x = rnorm(n=100,5,1),cat = factor(c(1,2)))

psem(
  lm(y ~ x + cat, data = df),
  data=df) |> plot(show = "unstd")

Victory!

As a further thought for @jslefche

Why is x not standardized here? We should be able to in coefs() - I'm a little puzzled by that.
Should we also implement an option in plot.psem() to remove categorical variables? I ask this as, for myself anyway, if I'm using Econometric Fixed Effects to handle causality problems, sometimes I have a TON of levels and really don't want to see any of it. I suppose this would mean filtering out anything with " = " from the ctab? Or did you implement some other schema for categorical variables? Hrm. Actually. This could get tricky if there were some categorical variables you DID want to see. So, perhaps a better option - and generalizable to more cases, would be a "remove_node" option.

Actually - ok, here's a version that does #2. Although it's a bit greedy (i.e., if you removed a node and another node had a part of the variable name in it, that would not work - but this is to get rid of all of the levels..... think about this more, maybe. I just woke up and haven't had my coffee)

plot.psem <- function (x, return = FALSE, 
                       node_attrs = data.frame(shape = "rectangle",
                                               color = "black", 
                                               fillcolor = "white"), 
                       edge_attrs = data.frame(style = "solid", 
                                               color = "black"), 
                       ns_dashed = T, alpha = 0.05, show = "std", 
                       remove_nodes = NA,
                       digits = 3, add_edge_label_spaces = TRUE, ...) 
{

  ctab <- coefs(x)
  # remove categorical predictors with ANOVA results with no estimate
  ctab <- ctab[-which(ctab$Estimate=="-"),]

  # Are there other nodes we don't want?
  # add a layer of error checking
  if(!is.na(remove_nodes) & !sum(grepl(remove_nodes, 
                                       unique(c(ctab$Response, ctab$Predictor))))>0){
    warning(paste0(remove_nodes, "not in ", 
                   unique(c(ctab$Response, ctab$Predictor)), 
                   ".\nTypo?"))
  }

  if(!is.na(remove_nodes) & sum(grepl(remove_nodes, ctab$Predictor))>0){
    ctab <- ctab[!grepl(remove_nodes, ctab$Predictor),]
  }

  if(!is.na(remove_nodes) & sum(grepl(remove_nodes, ctab$Response))>0){
    ctab <- ctab[-grepl(remove_nodes, ctab$Response),]
  }

  # std estimate error
  if(show == "std" & "-" %in% ctab$Std.Estimate) stop("Std. Estimates requested, but some paths cannot have Std. Estimates calculated.\nSee coefs() of your model and consider setting show to 'unstd'.\n")

  # in case categorical variabes were there but are now gone
  ctab$Estimate <- as.numeric(ctab$Estimate)

  ctab$Response <- as.character(ctab$Response)
  ctab$Predictor <- as.character(ctab$Predictor)

  # make nodes
  unique_nodes <- unique(c(ctab$Response, ctab$Predictor))

  nodes <- DiagrammeR::create_node_df(n = length(unique_nodes), nodes = unique_nodes, 
                          type = "lower", label = unique_nodes)
  nodes <- cbind(nodes, node_attrs)
  nodes[] <- lapply(nodes, as.character)
  nodes$id <- as.numeric(nodes$id)
  edges <- DiagrammeR::create_edge_df(from = match(ctab$Predictor, unique_nodes), 
                          to = match(ctab$Response, unique_nodes))
  edges <- data.frame(edges, edge_attrs)
  edges[] <- lapply(edges, as.character)
  edges$id <- as.numeric(edges$id)
  edges$from <- as.numeric(edges$from)
  edges$to <- as.numeric(edges$to)
  if (ns_dashed) 
    edges$style[which(ctab$P.Value > alpha)] <- "dashed"
  if (show == "std") 
    edges$label = round(ctab$Std.Estimate, digits)
  if (show == "unstd") 
    edges$label = round(ctab$Estimate, digits)
  if (add_edge_label_spaces) 
    edges$label = paste0(" ", edges$label, " ")
  sem_graph <- DiagrammeR::create_graph(nodes, edges, directed = TRUE)
  if (return) 
    return(sem_graph)
  DiagrammeR::render_graph(sem_graph, ...)
}

jebyrnes commented 3 months ago

FYI, with this last bit

psem(
  lm(y ~ x + cat, data = df),
  data=df) |> plot(show = "unstd", remove_nodes = "cat")

produces

calummaney commented 2 months ago

Works pretty well, cheers! Hope this can be incorporated.

jslefche / piecewiseSEM

Plotting a psem with categorical variables #299