integrated-inferences / CausalQueries

Bayesian inference from binary causal models
Other
24 stars 7 forks source link

Is it possible not to require dagitty? #340

Closed barracuda156 closed 1 week ago

barracuda156 commented 3 months ago

dagitty depends on V8, which is a very heavy dependency and also broken on a number of less-mainstream platforms. Is it possible to make it a suggested, but not required dependency?

barracuda156 commented 3 months ago

P. S. That also needs a soft dependency on ggdag, since that pulls in dagitty again.

till-tietz commented 2 months ago

Hi @barracuda156 apologies for the late reply here. Making dependencies + install more lightweight is a very valid suggestion. We're in the middle of a dev sprint for an updated version of CausalQueries at the moment and will take this issue up as part of that. Will keep you updated on this thread.

barracuda156 commented 2 months ago

@till-tietz Sounds good, thank you.

till-tietz commented 2 months ago

@macartan @lilymedina @gerasy1987 --> not sure if this is possible under the license of dagitty but since we only use very limited functionality from the package we could pull in only those functions to make install more user friendly

gerasy1987 commented 2 months ago

@till-tietz seems that the dagitty R package is under GPL-2 license, which, as I understand, means that if we stay open source (which I assume we are), we can use their code.

That said, I believe we are using dagitty::dagitty, which is the main function in the package, and also directly uses some of the V8 functionality, so I am not sure we can get around requiring dagitty.

till-tietz commented 2 months ago

@gerasy1987 fair point. Just writing up a basic fully R based version of the DAG statement parsing that could perhaps do the trick here. Will branch from this issue and push there

till-tietz commented 2 months ago

@gerasy1987 something like this should basically replicate the functionality of dagitty::dagitty() |> dagitty::edges() |> data.frame() which we use to construct other object in make_model() and require in slightly modified form for plotting.

make_dag <- function(model_string) {
  # split by ; first to separate discrete parts of the DAG statement
  model_string <- strsplit(model_string, ";")[[1]]

  lapply(model_string, function(model_substring) {

    nodes <- trimws(strsplit(model_substring, "(<->|->|<-)")[[1]])

    edges <- trimws(strsplit(model_substring, "(\\w+)")[[1]])[-1]

    dag <- vector(mode = "list", length = length(nodes) - 1)

    for(i in 1:(length(nodes) - 1)) {

      if(edges[i] == "<-") {
        dag[[i]] <- data.frame(
          v = nodes[i+1],
          w = nodes[i],
          e = "->"
        )
      } else {
        dag[[i]] <- data.frame(
          v = nodes[i],
          w = nodes[i+1],
          e = edges[i]
        )
      }
    }

    return(dag)
  }) |>
    dplyr::bind_rows() |>
    dplyr::arrange(v)
} 
till-tietz commented 2 months ago

@macartan for your visibility

macartan commented 2 months ago

I tried it out on a bunch of statements and seems to work well though see below

some funny stuff if people make mistakes:

Could be more information:

make_dag("a c") [image: image.png] Show Traceback [image: image.png] Rerun with Debug Error in if (edges[i] == "<-") { : argument is of length zero

Odder behavior here as <> is picked up as the edge and no error:

make_dag("a <> b -> c") v w e 1 a <> b c <>

make_dag("a <- a ->") v w e 1 a a ->

duplication not spotted: need to use "distinct()" ?

make_dag("a -> b; a -> b") v w e 1 a b -> 2 a b ->

we have a bunch of functions we use to get to the plotting data frame, including from dagitty; wonder if these can be reduced also:

dagitty_statement <- paste("dag{", statement, "}") |> dagitty::dagitty()

Add coordinates if provided (otherwise generated)

if (!is.null(x_coord) & !is.null(y_coord)) { names(x_coord) <- names(y_coord) <- nodes

dagitty::coordinates(dagitty_statement) <-
  list(x = x_coord , y = y_coord) |>
  coords2df() |>
  coords2list()

}

On Tue, Aug 6, 2024 at 1:49 PM Till Tietz @.***> wrote:

@macartan https://github.com/macartan for your visibility

— Reply to this email directly, view it on GitHub https://github.com/integrated-inferences/CausalQueries/issues/340#issuecomment-2271100917, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADBE57LIHIK5EU6QLEZCISLZQCZ6DAVCNFSM6AAAAABJMSZGKOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZRGEYDAOJRG4 . You are receiving this because you were mentioned.Message ID: @.***>

till-tietz commented 2 months ago

@macartan thanks for testing. Will add some logic to guard against malformed statements. I think we should be able to reduce reliance on dagitty for plotting too

till-tietz commented 1 month ago

@macartan this helper should deal with a lot of the edge cases arising from malformed statements. In make_dag we'll split the statement by ';' then apply consolidate_statement to each sub-statement. We could even just run consolidate_statement as a first step in make_model. Will put everything together and push to this issue's branch for review.

consolidate_statement <- function(statement) {
  ## consolidate edges
  statement <- gsub("\\s+", "", statement)
  statement <- strsplit(statement, "")[[1]]

  i <- 1
  st_edge <- c()

  while (i <= length(statement)) {
    # Check for pattern "<", "-", ">"
    if (i <= length(statement) - 2 &&
        statement[i] == "<" &&
        statement[i + 1] == "-" && statement[i + 2] == ">") {
      st_edge <- c(st_edge, "<->")
      i <- i + 3 # Skip the next two elements
    }
    # Check for pattern "<", "-"
    else if (i <= length(statement) - 1 &&
             statement[i] == "<" && statement[i + 1] == "-") {
      st_edge <- c(st_edge, "<-")
      i <- i + 2 # Skip the next element
    }
    # Check for pattern "-", ">"
    else if (i <= length(statement) - 1 &&
             statement[i] == "-" && statement[i + 1] == ">") {
      st_edge <- c(st_edge, "->")
      i <- i + 2 # Skip the next element
    }
    # Otherwise, just append the current element
    else {
      st_edge <- c(st_edge, statement[i])
      i <- i + 1
    }
  }

  # detect edges
  is_edge <- st_edge %in% c("->", "<-", "<->")

  # check for bare edges (i.e. edges without origin or destination nodes)
  # either dangling edge at beginning / end of statement  
  has_dangling_edge <- any(c(is_edge[1], is_edge[length(is_edge)]))
  # or consecutive edges within statement 
  consecutive_edge <- rle(is_edge)
  has_consecutive_edge <-any(consecutive_edge$values & consecutive_edge$lengths >= 2)

  if(has_dangling_edge || has_consecutive_edge) {
    stop("Statement contains bare edges without a source or destination node or both. Edges should connect nodes.")
  }

  ## consolidate nodes
  # check for unsupported characters in varnames
  if(any(c("<",">") %in% st_edge[!is_edge])) {
    stop("Unsupported characters in varnames. No '<' or '>' in varnames please.")
  }

  if("-" %in% st_edge[!is_edge]) {
    stop("Unsupported characters in varnames. No hyphens '-' in varnames please; try dots?")
  }

  if("_" %in% st_edge[!is_edge]) {
    stop("Unsupported characters in varnames. No underscores '_' in varnames please; try dots?")
  }

  # counter that increments every time we hit an edge --> each character 
  # belonging to a node thus has the same number (node_id)
  node_id <- cumsum(is_edge)
  # split by node_id and paste all characters with same node_id together
  nodes <- split(st_edge[!is_edge], node_id[!is_edge]) |>
    vapply(paste, collapse = "", "") |>
    unname()

  return(list(nodes = nodes, edges = st_edge[is_edge]))
}
till-tietz commented 1 month ago

@macartan @gerasy1987 finally managed to replace ggdag + igraph functionality with some lightweight customer helpers. changes are in the PR. Let me know if you are happy with how plotting output.

barracuda156 commented 1 month ago

@till-tietz Thank you for working on this!

barracuda156 commented 2 weeks ago

Just for the record, there are some remnants of dagitty in the code:

Invalid package aliases in Rd file 'CausalQueries-package.Rd':
  ‘-package’
* checking Rd cross-references ... WARNING
Missing link or links in Rd file 'clean_statement.Rd':
  ‘dagitty’

Missing link or links in Rd file 'make_dag.Rd':
  ‘dagitty’

Missing link or links in Rd file 'make_model.Rd':
  ‘dagitty’

Missing link or links in Rd file 'print.causal_model.Rd':
  ‘dagitty’

Missing link or links in Rd file 'summary.causal_model.Rd':
  ‘details’

See section 'Cross-references' in the 'Writing R Extensions' manual.

I am building from 9d6da894a89630130b5ddada936c1cc8b196a538 now.

till-tietz commented 2 weeks ago

Hi @barracuda156 ... will check + clean the roxygen snytax and rebuild with devtools to purge these