inlabru-org / inlabru

inlabru
https://inlabru-org.github.io/inlabru/
73 stars 21 forks source link

first draft of zipzap vignette #207

Closed dmi3kno closed 8 months ago

dmi3kno commented 9 months ago

I would like to initiate the vignette on ZIP and ZAP models. These are very useful in my field of work and I would appreciate learning more about them through the contribution of others.

finnlindgren commented 9 months ago

Thanks!

A practical consideration is that any vignette/article that uses (directly or indirectly e.g. via bru()) must be placed in the vignettes/articles folder. That way, they are built for the website by an automated github action that has access to INLA, but the package doesn't include it directly; only via an automated link in the website_examples vignette, and a manually added menu link in _pkgdown.yml

dmi3kno commented 9 months ago

Sure I will relocate it. I have been just following your bru_mapper vignette. I thought you move them after they mature

finnlindgren commented 9 months ago

The bru_mapper vignette can be an ordinary vignette because it doesn't use the INLA package.

dmi3kno commented 9 months ago

I fixed some of the comments. The units on prediction plots are all over the place (Poisson is 1/1000s, ZIP in 1000s and ZAP in 100s). Clearly something wrong with the models.

I feel bad you are spending time on this. Perhaps one of your students could have a look. Thank you very much!

finnlindgren commented 9 months ago

The setup code sets eval=FALSE,preventing the code from actually running; it seems to work fine with eval=TRUE though.

Only the final model seems to produce sensible results here; even the second one (that's said to be much better than the first) get an odd hotspot. This is likely due to the vegetation factor covariate; it's too sensitive to local variation in these model I think. Would probably be better to exclude that covariate. Perhaps include one of the other continuous covariates instead, e.g. waterdist, heat, or slopeangle.

dmi3kno commented 9 months ago

I like that we have a factor and a continuous here. Ths model does not need to be perfect, it needs to be illustrative.

dmi3kno commented 9 months ago

Does the centering of continuous raster matter? Is there a bru_mapper that ensures that the continuous raster covariates fit nicely?

finnlindgren commented 9 months ago

The problem with vegetation is that it to many and localised values, making it sensitive to the behaviour at just a small set of locations. Setting the precision parameter to a larger value might help; hyper=list(prec=list(initial=..., fixed=TRUE)).

For linear models, centering mostly changes the intercept interpretation (and removes colinearity with the intercept) but shouldn't be important. In particular, inlabru cannot and should not do its own centering (since it doesn't know if it will be involved in a non-linear expression). Even for linear models, data should be intentionally centred instead of auto-centred, with the possible exception of systems that can then undo data-dependent centring, and/or keep track of the centring and apply it to newdata in predictions. I sometimes see people calling scale() directly in model formulas, which basically makes the parameter estimates non-interpretable, and automated prediction impossible.

dmi3kno commented 9 months ago
#> Unknown keyword in `hyper' ` prec '. Must be one of  theta theta1 prob logit probability
finnlindgren commented 9 months ago

hyper for the vegetation() specification, not the observation model.

finnlindgren commented 9 months ago

Further on covariate centring&scaling: it is good if the scale of the covariates is around "1" though, as the internal algorithms are sometimes adjusted to be accurate in those cases.But that's best done as a preprocessing step so one can have full control and knowledge. (e.g. converting elevation in metres into kilometres when dealing with temperature data across the globe).

dmi3kno commented 9 months ago

hyper for the vegetation() specification, not the observation model.

This makes a huge difference, I feel. Not sure the value I picked is good. Feels rather arbitrary.

finnlindgren commented 9 months ago

I'm doing some changes/corrections to the vignette, to compute the correct/comparables posterior predictors, and fix the zap model; it was using the same intercept for both the "present" and "count" parts, for example. Also adding diagnostic plots. Also found why the vegetation precision had such a big effect; you had fixed the precision to exp(20), i.e. variance $\approx 0$, which was a bit more extreme than I had in mind.

finnlindgren commented 9 months ago

In the ZAP model, the same components, with the same scaling, were used in both predictors, but since they are on different scales, and are intended to capture different aspects, that's a likely problem. I'm testing a bunch of local changes now that should make things a bit clearer, I hope.

dmi3kno commented 9 months ago

In the ZAP example, I showed you they also use the same components, but explain that the SPDE could be different. Looking forward to your edits!

finnlindgren commented 9 months ago

Even if some covariate might be ok with the same coefficient, the intercept definitely needs to be different.

finnlindgren commented 9 months ago

Turns out one of the issues with the vegetation covariate is that the first level is also the most rare (just 48 boxes has that value), which messes up "factor_contrast", as the differences to the reference level are uncertain. Using "factor_full" with a constr = TRUE may be a safer option. Unfortunately level reordering seems complicated; I couldn't adapt the options given at https://stackoverflow.com/questions/72723613/how-can-i-change-the-ordering-of-levels-for-a-categorical-spatraster-in-terra

finnlindgren commented 9 months ago

In the current version, the ZIP model actually does worse than plain Poisson, but the ZAP model does do better. It's currently plotting and assessing the posterior median instead of the posterior mean, to avoid the artificial hotspot in the posterior mean that I think comes from the rare vegetation type and $E(\exp(X))\approx\exp(E(X) + V(X)/2)$;when the uncertainty is high, the expectation also becomes high. Quantiles are more stable.

dmi3kno commented 9 months ago

Made some small corrections to the text and organized the plots with patchwork. Couple of questions:

finnlindgren commented 9 months ago

The predictors work on different scales, so need to be scaled differently. Since the vegetation was only used in the "present" model, that only leaves the "field" component as a component with more than one parameter, so that one was made into a copy model, so inla estimates a scaling parameter. (I actually tried that for elevation as well, but I think "linear" models can't be "copy" models; the error message didn't say that directly though, so there may have been another issue.

I also found a bug in the summary.component() code, making it fail when presenting info about components with "weights" (I experimented with using the component weights argument to scale the elevation covariate effect I km instead of metres. The estimation worked fine but the summary() call wouldn't work.) I know how to fix it.

finnlindgren commented 9 months ago

The pred_var calculations aren't quite right; they don't fully take into account the zero-inflation part. Similar with pit; I just removed the zero counts, but that's not quite right either.

dmi3kno commented 9 months ago
finnlindgren commented 9 months ago

I was referring to the pit assessment plotting code afterwards.

Check inla.doc() for the model for the parameterisation.

finnlindgren commented 9 months ago

Ah, did I just use pnorm() in the predict call? Verify with inla.doc() which quantity should multiply the truncated Poisson part.

finnlindgren commented 9 months ago

Also, if I used pnorm should have used plogis! And if it's in the wrong direction, should add lower.tail=FALSE.

dmi3kno commented 9 months ago

Yeah, It is written plogis, but should be for complimentary tail.

$$ \text{Prob}(y\vert\dots)=p\times 1_{y=0}+(1-p)\times \text{Poisson}(y\vert y>0) $$

where $p=\text{logit}^{-1}(\theta)$

I have these equations in the vignette exactly this purpose.

Looks surprisingly good

# Mean Dawid-Sebastiani scores
c(mean(lambda_poi$DS), mean(lambda_zip$DS), mean(lambda_zap$DS)) %>% 
  set_names(c("Poisson", "ZIP", "ZAP"))
#>  Poisson      ZIP      ZAP 
#> 2053.304 2055.336 1204.949

Some very bright spots appear. Need to tweak the prior for the vegetation, as you suggested earlier. image

finnlindgren commented 9 months ago

I've been going through the code and updating the score code a bit, and when I added a plot of the probabilty used in the lambda-prediction, it was larger where there are no points. The original code was actually right. The binomial part estimates the "probability of presence", lets call it p. The ZAP part estimates q="probability of absence". So 1-p=q, and the original lower-tail=TRUE code was actually correct! I'll change it back and do some more diagnostics.

dmi3kno commented 9 months ago

Hmm, I still don't quite understand. It can very well be that we need a probability of absence binomial. This is how I read the math $p\times1_{y=0}$

finnlindgren commented 9 months ago

The binomial part of the third model is a presence/absence model, and estimates the presence probability. In the zap model parameterisations, the parameter is the "probability of zero", i.e. the opposite, so when the formula says (1-p)xPoisson, 1-p is actually the presence probability. Take a look at the new version where I've added a plot of the presence probability.

finnlindgren commented 9 months ago

The new version also has sensible score interpretation; the ZAP MAE score is slightly lower than the others, the RMSE is slightly higher, but the DS score is clearly lower, indicating it captures uncertainty/variability better.

finnlindgren commented 9 months ago

Note: I haven't fully corrected the prediction variance calculations, so pred_var might not be fully accurate.

finnlindgren commented 9 months ago

Sorry, I was a bit unclear, as the zap parameter is fixed to zero in our case. What I should have said was that the binomial part estimates p=P(presence). In zap part we estimate lambda for Poisson(y|y>0). That lead me to $p\lambda$. But it's a truncated Poisson, so that's not quite right either. It should be $E(count) = E[E(count|present) I(present)] = p E(count|count>0)$, which isn't quite $p\lambda$. I'll work out what that conditional expectation is and update the code.

finnlindgren commented 9 months ago

The correct prediction expectation is $\frac{p \lambda E}{1- \exp(-\lambda E)}$, where E is the effort/exposure/area. [Edited: corrected]

dmi3kno commented 9 months ago

For ZIP

$$ \begin{gathered} E(count)=(1-p)\lambda \ Var(count)= (1-p)(\lambda+p \lambda^2) \end{gathered} $$

For ZAP

$$ \begin{gathered} E(count)=\frac{1}{1-\exp(-\lambda)}p\lambda \ Var(count)= \frac{1}{1-\exp(-\lambda)} p(\lambda+p \lambda^2)-\left(\frac{1}{1-\exp(-\lambda)}p\lambda\right)^2 \end{gathered} $$

And of course $\text{logit}(p)=\alpha_0+\alpha_iX_B$ and $\log(\lambda)=\beta_0+\beta_iX_Z$.

Edit: You are right, should be $\lambda E$ everywhere

finnlindgren commented 9 months ago

OK, thanks! I've modified most of the code already to focus on E(count) instead of lambda. Will modify the ZIP part as well and then commit.

dmi3kno commented 9 months ago

This is one of those things you wish you did not have to remember. Wouldn't it be cool if predict was able to correctly identify the expectations and variance for each type of model?

Should we put the expectation/variance math into the vignette? I committed

finnlindgren commented 9 months ago

Yes, I've done work on this before, an the plan was to have helper functions of some sort. Similar issues arise in all posterior data level predictions.

dmi3kno commented 9 months ago

I just went along with this ZAP hackery (fixing the parameter), but there's now zeroinflatedcenpoisson0 in INLA. There are no examples or documentation anywhere, but I wonder if that's the model we actually need.

I have my concerns regarding the appropriateness of RMSE and MAE as a metric for counts.

finnlindgren commented 9 months ago

No the "cen" models look like they are just version that allow censored observations like in survival analysis; I don't think that helps us here as we want truncation instead.

finnlindgren commented 9 months ago

I've pushed my changes but I'm sure I made some mistakes. Will be busy until Wednesday.

dmi3kno commented 9 months ago

Thank you! This is a lot for me to chew on!

finnlindgren commented 9 months ago

I'm taking another quick look to see if I can spot the bug that made the zip prediction variance invalid.

finnlindgren commented 9 months ago

Found it and am fixing it.

finnlindgren commented 9 months ago

The rare vegetation types are an issue...

finnlindgren commented 9 months ago

The new version has the bugs fixed, and also n.samples=2500 to hopefully avoid the Monte Carlo error of the prediction confusing things. Not sure how to interpret the results yet.

finnlindgren commented 9 months ago

I think one should assess predictions of "count>0" and "count|count>0" separately, for all three models. The mixing of zero-effects and other effects in the score calculations confuses things. The ZAP model is clearly doing something better than the others, but the scores aren't clear about it at all.

dmi3kno commented 9 months ago

I picked out two classes of vegetation that make the most sense. Mean DS is now showing improvement of ZAP over the other models.

I think our spatial field is a bit too focused, so remote nests get picked out by DS metric (showing as bright pixels). Perhaps we could relax the SPDE? Changing sigma triggers overfitting, but change in range does not produce any effect. Hmm..

finnlindgren commented 9 months ago

I thought about those individual pixels, but those appear simply because a zero gets one score, and 1 gets another score, and they occur in the region with generally low probability, so the ones get a high score. One should really only interpret the pairwise score differences (and their spatial averages/sums), and those were more mixed even in the previous version (I haven't checked your new version yet). Also, we're still doing in-sample assessment; out-of-sample would be preferable. Unfortunately CPO doesn't quite work for comparison here because the ZAP/Hurdle model has a mix of single and pairs of observations that would need to be left out. Simpler would be to do a training/test data split, using some data only for estimation and the rest of the data only for predictions. Also, computing separate scores for prediction of $I(y_i > 0)$ and for $y_i|y_i>0$ would help; I think it will then be clear that the ZP/Hurdle model predicts $I(y_i>0)$ better.

finnlindgren commented 9 months ago

When I run the new model, I still get worse scores for ZAP than for the other two (across all four scores; they're all negatively oriented, and the ZAP scores are the largest, i.e. worst):

Model | MAE | MSE | MDS | MLG
-- | -- | -- | -- | --
Poisson | 0.2711175 | 0.6257786 | -2.191408 | 0.4124088
ZIP | 0.2918497 | 0.6871863 | -2.162391 | 0.4286283
ZAP | 0.3046396 | 0.7194702 | -2.157850 | 0.4477888
finnlindgren commented 9 months ago

But like I said, these are in-sample scores. Should check out-of-sample scores.