rbchan / unmarked

R package for hierarchical models in ecological research
https://rbchan.github.io/unmarked/
37 stars 25 forks source link

Update prediction methods to allow functions in the formula #157

Closed aosmith16 closed 4 years ago

aosmith16 commented 4 years ago

I updated data.frame prediction methods for unmarkedFit, PCount, OccuFP, ColExt, PCO, and GMM to use the "terms" information based on the original data. This allows the use of functions like poly() with raw = FALSE, scale(), and bs(), etc., in the formula while getting predictions from new data based on the original transformation instead of a new transformation.

I also ending up adding information on factor levels from the original data to the new model matrix so not all factor levels must be included in the new dataset. I ran into a problem trying to do this when making predictions based on the first example of pcount() in the documentation.

Predict functions for OccuMulti, OccuMS, and OccuTTD use a different approach and I have currently left these untouched.

Different model parameters can be based on different combinations of covariate datasets. For example, the detection probability could have site-level, year-level, and observation-level covariates for colext() models. I believe I matched the correct datasets to the correct parameters but I could use some other eyes on this for ColExt, PCO, and GMM in particular.

Current tests still say they all pass (0 errors, 0 failures, 99 tests). I have not added new tests. However, I think an approach could be to make sure that for a model with an in-formula transformation, the original (expanded) dataset matches the new dataset for prediction (where the new data are the first few rows of the original dataset).

Finally, I realized as I was finishing this up that a in-formula transformation used on a site level covariate for a site level parameter and the same transformation used for the same site level covariate for a year or observation level parameter end up being different. This is due to the expansion of the site-level dataset. See, e.g., the difference in transformation in scale(0:5) and scale(c(0:5, 0:5)). I haven't figured out if this is a problem or not (maybe it's a good thing!), but is certainly something to consider.

rbchan commented 4 years ago

Thanks again, this looks great. If you ever get a chance to add a few unit tests, that would be helpful just to make sure that bs() and poly() and others continue to perform as expected.