Use of text files to create and manage DAGs

r-causal / ggdag

:arrow_lower_left: :arrow_lower_right: An R package for working with causal directed acyclic graphs (DAGs)

https://r-causal.github.io/ggdag/

Other

433 stars 28 forks source link

Use of text files to create and manage DAGs #118

Closed lorenzoFabbri closed 1 year ago

lorenzoFabbri commented 1 year ago

I find the current pipeline of building and managing DAGs not that streamline (especially with dagitty). When I start working on new papers, I usually have to make a data request, which means choosing from the available variables and creating a csv file. The idea would be the following:

Create one text file with columns from and to (e.g., node A and node B of the graph).
Create one text file of metadata with columns node, variable_name (the name of the node might not correspond to the variable in the dataset), description (a short text description of each variable), labels (if a factor, levels and labels), observed (whether it is observed or not), status (e.g., exposure).

These files would then be used to create a ggdag object. Since the files contain information on e.g., levels and labels, it would then be possible to generate a publication-ready table of confounders (a sort of Table 1).

Would this be useful?

malcolmbarrett commented 1 year ago

So you basically want to be able to create them from files rather than dagitty or dagify(), then have some additional information joined to them, some of which would determine how the DAG is assembled? Is that right?

It's not pushed to the PR yet but #117 is adding functionality for updating the data and dagitty components of the ggdag object. In theory that could be extended to create a DAG from data frame, e.g. you would read and join those files yourself then supply the data to ggdag to set it up. Would that meet your need?

I should also note that you can always just join extra information like variable_name to a tidy_dagitty object and work with it as normal.

lorenzoFabbri commented 1 year ago

I think that would do. Essentially, I'd like to work with data frames rather than other objects. Starting from a tribble of nodes and one of metadata (which I find tremendously useful), build a ggdag object. Exporting a ggdag object to a tibble is already possible, yes.

To me it's more intuitive and it reflects closer what we usually do in applied epi research.

malcolmbarrett commented 1 year ago

Proof of concept from #117

library(ggdag, warn.conflicts = FALSE)
graph <- data.frame(
  name = c("c", "c", "x"),
  to = c("x", "y", "y")
)

metadata <- data.frame(
  name = c("x", "y", "c"),
  status = c("exposure", "outcome", "latent"),
  adjusted = c("unadjusted", "unadjusted", "adjusted"),
  variable_descriptions = c("a", "b", "d")
)

dag_data <- dplyr::full_join(
  graph,
  metadata,
  by = "name"
)

tidy_dag_data <- as_tidy_dagitty(dag_data)
tidy_dag_data
#> # A DAG with 3 nodes and 4 edges
#> #
#> # Exposure: x
#> # Outcome: y
#> # Latent Variable: c
#> #
#> # A tibble: 4 × 11
#>   name       x       y direction to      xend   yend circular status   adjusted 
#>   <chr>  <dbl>   <dbl> <fct>     <chr>  <dbl>  <dbl> <lgl>    <chr>    <chr>    
#> 1 c     -1.23  -0.0453 ->        x     -0.367  0.468 FALSE    latent   adjusted 
#> 2 c     -1.23  -0.0453 ->        y     -0.352 -0.532 FALSE    latent   adjusted 
#> 3 x     -0.367  0.468  ->        y     -0.352 -0.532 FALSE    exposure unadjust…
#> 4 y     -0.352 -0.532  ->        <NA>  NA     NA     FALSE    outcome  unadjust…
#> # ℹ 1 more variable: variable_descriptions <chr>
ggdag(tidy_dag_data)

^{Created on 2023-08-09 with reprex v2.0.2}

lorenzoFabbri commented 1 year ago

It looks really good and handy!