rbchan / unmarked

R package for hierarchical models in ecological research
https://rbchan.github.io/unmarked/
36 stars 25 forks source link

Add cross validation method #123

Closed kenkellner closed 5 years ago

kenkellner commented 5 years ago

Adds a method crossVal for cross validation of unmarked fit objects, addressing #52 . Includes k-fold, holdout, and leave-one-out options. Calculates RMSE and mean absolute error. The method can also be applied to fitLists to allow comparison of different model fits.

If you decide to implement this or something like it, I'm open to suggestions for improvement. Two things that come to mind:

  1. When applied to fitLists, the full dataset is divided into folds independently for each fitted object. This could be misleading when using the holdout method, since a difference in calculated RMSE/MAE between two models might be more a function of which datapoints ended up randomly assigned to the holdout test set rather than an actual indication of which model fits better. However, partitioning the dataset first and then using the same partitions for each model might cause issues if the models, having different covariates, have slightly different starting datasets (?)

  2. Leave-one-out (and also maybe k-fold depending on the number of folds) could potentially be very slow for some fitting functions. One solution would be to parallelize in the same way that parboot does.

rbchan commented 5 years ago

@kenkellner This looks great. I will go ahead and merge in the changes. In addition to the extensions you mentioned, it might be nice if users could provide custom fit statistics.