opetchey / multifarious_response_diversity

MIT License
0 stars 0 forks source link

Gaussian GAMs only? #6

Open opetchey opened 1 year ago

opetchey commented 1 year ago

Getting the partial derivatives is more straightforward for Gaussian GAMs due to the identity link function. Get’s harder with non-Gaussian.

Owen proposes that we constrain this project to Gaussian ones. But would also mean we need real data that are compatible.

gavinsimpson commented 1 year ago

It's only harder in the sense that gratia would need a response_derivatives() function and a number of people (including myself for my own research) have been asking for this anyway (including a wishlist item for a workshop I'm presenting at next week - so assuming a good weekend of coding this it could be available quickly). The way I plan to implement this would re-use a lot of the infrastructure already written for derivatives() and partial_derivatives() so I don't expect it to be a difficult addition.

Unless restricting yourselves to Gaussian GAMs covers all the intended use-cases that were envisaged, I wouldn't let this constrain the scope of the data you want to work with. I only mentioned it because it occurred to me while reading the report and wanted to raise it in case it was an issue/relevant.

opetchey commented 1 year ago

Thanks for explaining Gavin. Gaussian GAMs do not cover all intended use-cases. Let us see what data we have and go from there.

gavinsimpson commented 1 year ago

To update this, version 0.8.1 (0.8.0 didn't pass CRAN's checks) implemented partial_derivatives(), and is on CRAN.

As of a few days ago, gratia now contains a response_derivatives() for computing (what are essentially) partial derivatives at the scale of the response (this can be on the link or the response scale). These response derivatives are an extension of what partial_derivatives() does, but instead of working at the level of a single tensor product smooth, response_derivatives() works on the additive sum of all model terms, where we hold all but the focal variable constant at representative values (if not supplied by the user, I usey at the observation closest to the median for continuous covariates, or the modal level for factor covariates, in the data used to fit the GAM) and we compute the derivative of y with respect to the focal variable.

The uncertainty on the computed derivative is determined using posterior sampling - for derivatives on the link scale we could compute the uncertainty directly but I haven't implemented this (yet?) and instead I do it less efficiently via posterior sampling. This makes the code simpler but it is using a sledgehammer to crack a nut if all you want are derivatives +/- SE on the link scale. One advantage of using posterior sampling however is that because gratia can now use the Metropolis Hastings sampler from mgcv::gam.mh(), we don't need to rely on Gaussian (Laplace) approximations to the posterior when computing the uncertainties (credible intervals) on these response derivatives.

response_derivatives() is available from 0.8.1.11 (the development version) which I plan to have made available on CRAN sometime before March 20th. For now you can install the development version using the instructions in the README. The pkgdown site for gratia includes an example: https://gavinsimpson.github.io/gratia/reference/response_derivatives.html