stan-dev / rstan

RStan, the R interface to Stan
https://mc-stan.org
1.04k stars 267 forks source link

Conflict with conflicted library, remove uses of `apropos()`? #1141

Open emstruong opened 1 month ago

emstruong commented 1 month ago

Hello,

It appears that Rstan's use of the apropos() function is causing conflicts with the conflicted library, so I was advised to open an issue here to see if all uses of apropos() could be removed?

It doesn't seem like changing apropos() is feasible and it's not clear if it would be a good idea or even possible to change the conflicted library to make things work. Also, conflicted seems like a really important package for computational reproducibility and considering R's nature.

Please see the attached discussion for further context. Here

Michael

jgabry commented 1 month ago

Hi, thanks for reporting this. I didn't even know RStan called apropos but I guess many years ago it was used here on line 82:

https://github.com/stan-dev/rstan/blob/9b8d8fe715f1d51063621e6ca01d6265b02356fe/rstan/rstan/R/rstan.R#L77-L97

But I'm not sure what it should be changed to. Is there a recommended alternative to apropos that wouldn't cause this problem? Looking at the RStan code here, it seems like it's trying to find existing S4 objects that are stanmodel objects identical to the one being created, in which case it returns the existing one instead of creating a new one and compiling again (@bgoodri is that right?).

emstruong commented 1 month ago

Hi, thanks for reporting this. I didn't even know RStan called apropos but I guess many years ago it was used here on line 82:

https://github.com/stan-dev/rstan/blob/9b8d8fe715f1d51063621e6ca01d6265b02356fe/rstan/rstan/R/rstan.R#L77-L97

But I'm not sure what it should be changed to. Is there a recommended alternative to apropos that wouldn't cause this problem? Looking at the RStan code here, it seems like it's trying to find existing S4 objects that are stanmodel objects identical to the one being created, in which case it returns the existing one instead of creating a new one and compiling again (@bgoodri is that right?).

Hi, perhaps something like ls() or objects() might be made to be more appropriate? apropos() calls grep() on ls() internally, anyways. I'm not deep enough into S4 objects and the RStan codebase to know if this would be satisfactory.

The 'real issue' is that because apropos() internally invokes search(), this ends up eventually triggering conflicted when vapply() is applied onto the .conflicted (mis-spelling?) object, which brings us to this issue and the advice to not use apropos in libraries.

ellisp commented 2 weeks ago

Here is a reprex of the problem as it appeared to me.

library(conflicted)
library(rstan)
library(dplyr)

scode <- "
parameters {
  array[2] real y;
}
model {
  y[1] ~ normal(0, 1);
  y[2] ~ double_exponential(0, 2);
}
"
fit1 <- stan(model_code = scode, iter = 10, verbose = FALSE)
hadley commented 2 weeks ago

The problem is that because inherits = TRUE and mode = "S4" stan is effectively evaluating every symbol in every environment that is a parent of the function environment (i.e. all the packages that it imports before eventually getting to the global environment). I think you are probably more likely to find the model that you're looking for by doing e <- parent.frame() and then setting inherits = FALSE, but you might also avoid the conflict with conflicted by dropping the mode check (or doing it yourself later).

...

After a little more investigation part of the problem, I noticed that model_re is pretty broad, so I don't think dropping mode is likely to have much impact. I also forgot that apropos() doesn't take an environment, but I think you could do something like this:

is_stan_model <- function(x) {
  isS4(obj) && (is(obj, "stanfit") || is(obj, "stanmodel"))
}

find_matching_model <- function(model_code, env) {
  model_re <- "(^[[:alnum:]]{2,}.*$)|(^[A-E,G-S,U-Z,a-z].*$)|(^[F,T].+)" 

  names <- ls(envir = env)
  candidate_names <- names[grepl(model_re, names)]

  for (name in candidate_names) {
    obj <- get(name, envir = env, inherits = FALSE)
    if (is_stan_model(obj) && identical(obj@model_code[1], model_code)) {
      return(obj)
    }
  }

  NULL
}

If you wanted to be extra safe to preserve the existing behaviour, you could call find_matching_model with both parent.frame() and globalenv() (when they are different).

jgabry commented 1 week ago

@hadley Thanks for the clarifying comments and the code suggestions, much appreciated. @bgoodri what do you think about using @hadley's suggestion? I can put together a PR for it if you're on board with it.