Expanding abundance when fitting Bernoulli-cloglog model

James-Thorson-NOAA commented 7 months ago

One use-case for SDMs is to fit a logistic-regression using a cloglog link function, but then (instead of using the inverse-link to calculate the predictor) exponentiating the linear predictor to calculate abundance that could then be reported from predict or when calculating an area-weighted abundance index. This is, e.g., used by Gruss-Thorson-2019, and it matches the interpretation of a cloglog link as a thinned Poisson-point-process from the presence-only literature. It is implemented in VAST using ObsModel[13,1], as demonstrated in the combined-data demo.

I envision that we need some interface that separates (1) the inverse-link function that's applied when calculating the data likelihood from (2) the inverse-link function that's applied when calculating the predicted response for predict calls. I'd then copy the same interface for use in tinyVAST. Any ideas, or do you want to discuss @seananderson?

James-Thorson-NOAA commented 6 months ago

To clarify the problem a bit more: If you have a count $C$ from a Poisson point process with intensity \lambda, then:

Pr(C>0) = \pi = 1 - e^{-\lambda)

We then might want to calculate the total intensity int_s{\lambda ds} as predicted abundance. A Bernoulli GLM with cloglog-link arises fits that likelihood using linear predictor log(\lambda), but we then want to treat it as log-linked when calculating the abundance-index.

I think the simplest option is to pass TMB two family arguments (perhaps family and family_pred, where one is used during the data-likelihood, and the other during expansion. By default, the family_pred argument is missing, and then fixed at family_pred = family. But advanced users could pass family_pred (perhaps via sdmTMBcontrol or via predict) that over-rides the default.

As I said, I need to implement this in tinyVAST e.g., to allow this demo to give the same index results when using the presence-absence or other variables as default factor level, and I'd love to keep in lock-step with sdmTMB regarding family interface.

seananderson commented 6 months ago

Hmm. A few thoughts:

Do we need the full second family architecture or would a second link (or vector of links) be enough? Perhaps that depends whether this might foreseeably be needed in delta families. Might one ever be using a delta model (say for biomass) and then only using the first linear predictor as part of an abundance index? Maybe? That's already baked into the Poisson-link families, I guess, with the log link.
I was going to say I'm less fussed about the predict function, since the default is in link space anyways and the user can always inverse-link the linear predictor however they'd want, but perhaps a user would want to make conditional predictions with standard errors, say using only fixed effects.
In a fit->predict->expansion workflow it's nice to have flexibility at the last stage for integration-specific options (like an area vector). So, at the very least, perhaps integrate_output()/get_index() should have this extra argument. Since integrate_output() skips over the prediction object, I presume the argument should be there.
I'd like to keep 'control' about optimizer settings and similar if possible.
I presume defining a bernoulli() family with an extra argument wouldn't work? That gets around binomial() being a built-in function. I guess this could also apply to binomial with trials > 1.

Assuming it might be used in a standard delta model and that a new Bernoulli wouldn't solve it, yes, something like family_pred in integrate_output(), and optionally also predict(), would work.

pbs-assess / sdmTMB

Expanding abundance when fitting Bernoulli-cloglog model #339