paul-buerkner / brms

brms R package for Bayesian generalized multivariate non-linear multilevel models using Stan
https://paul-buerkner.github.io/brms/
GNU General Public License v2.0
1.28k stars 184 forks source link

Expose `brms:::validate_formula`? #1676

Open athowes opened 3 months ago

athowes commented 3 months ago

Hi Paul,

I'm looking to use the function brms:::validate_formula as a part of an R package. See issue https://github.com/epinowcast/epidist/issues/195.

I wonder if you'd consider exposing this function as it might be useful for package developers relying on brms?

Additionally, would you be able to advise on which checks validate_formula provides? I had previously implemented some checks (when I was using just a list of formula like list(mu ~ 1, sigma ~ 1)) and unit tests like:

  1. check the distributional parameters specified in formula are the same as those in the family,
  2. check that the terms included in the formula are indeed in the data.

Are checks like these housed in validate_formula? They don't seem to be on inspection/initial testing I've done. Might something like this be elsewhere in the brms package? Want to avoid reinventing wheel etc.!

Thanks very much for any help! Adam

Edit: actually, similar question about exposing brms:::validate_family (also use that one!)

paul-buerkner commented 3 months ago

I think it is indeed worth exposing these functions. And yes, they initial validation checks are done there. The data validation is however done differently in brmsterms, which is exposed to the user already.

athowes commented 3 months ago

Thank you! I've also since used brms:::validate_newdata as another potential one.

Caveat that this is just asking for help rather than improving the package (though I suppose this ask might prompt documentation if others might use these features) but I'm having trouble validating that "the variables written in the formula are in the data". Do you know which function I'd use to get this?

Right now I have:

formula <- brms:::validate_formula(formula, family = family, data = data)
# Using this here only for checking purposes: expect it catches some errors
brmsterms(formula)
paul-buerkner commented 3 months ago

validate_data it should be.

Adam Howes @.***> schrieb am Mi., 31. Juli 2024, 17:56:

Thank you! I've also since used brms:::validate_newdata as another potential one.

Caveat that this is just asking for help rather than improving the package (though I suppose this ask might prompt documentation if others might use these features) but I'm having trouble validating that "the variables written in the formula are in the data". Do you know which function I'd use to get this?

Right now I have:

formula <- brms:::validate_formula(formula, family = family, data = data)

Using this here only for checking purposes: expect it catches some errors

brmsterms(formula)

— Reply to this email directly, view it on GitHub https://github.com/paul-buerkner/brms/issues/1676#issuecomment-2260720390, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCW2AGYHJDWFCWXPXFSY23ZPD3JDAVCNFSM6AAAAABLWLSOTGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRQG4ZDAMZZGA . You are receiving this because you commented.Message ID: @.***>

athowes commented 3 months ago

Thanks! I've been trying to use:

bterms <- brms::brmsterms(formula)
brms:::validate_data(data, bterms)

I'm having trouble as validate_data is flagging that there aren't the mu parameters in my data (or more broadly other dpars). I suppose the data object is quite a special brms construct? If we are talking about the data that a user passes into a model, I wouldn't expect it to contain columns for the internal distributional parameters used like mu, sigma, shape etc.

paul-buerkner commented 3 months ago

I think these questions should better be addressed on Stan discourse. Also please provide more context to what you are trying to achieve. based on the provided info I am not sure what is happening.

athowes commented 3 months ago

Yep that's fair, I think a thread could be a better venue, so apologies for putting things here. I'll try to figure something out about my problem and if not make a thread on the discourse.

Regarding the context, and feel free to ignore:

What I was trying to achieve was to create a function called epidist_formula which creates a formula for use with a custom brms family. I would imagine that it's a bit unnecessary to include checks in it, since it will be checked anyway when passed to brms::brm, but I had been looking to check that the formula the user provided was reasonable within the function epidist_formula itself. (And likewise within epidist_family I had been trying to check things about the family / use as much brms functionality for conversion of e.g. stats:: families and strings to brmsfamily objects as posssible.) One reason I might like to do this (include checks early) is to stop a user who wants to go step-by-step creating their objects before they get to putting them all into brms::brm.

paul-buerkner commented 2 months ago

I think I would like to expose these functions only with brms 3.0 once some more changes to the validate_* functions have been made.