wpeterman / ResistanceGA

Optimize resistance surfaces using Genetic Algorithms
37 stars 15 forks source link

AIC model selection for categorical surfaces - k #41

Closed hannehaug closed 1 year ago

hannehaug commented 1 year ago

Hi, I'm working with several categorical resistance surfaces with a large difference in the number of categories. When performing model selection using AIC, the categorical surfaces with more categories are penalized more due to k = the number of categories. This seems to heavily affect the result in such a degree that the most simple surface always get ranked first, even though it has the lowest R2m. However, the MLPE regression model does not include the categorial surface directly, rather it includes the estimated effective distance (which is a continuous variable). I'm wondering then if using k = the number of categories is the only appropriate way to go about this? Or could one argue that the effective distance is a continuous variable so that when using AIC to rank the models, k would equal 2? Thanks!

wpeterman commented 1 year ago

This is an issue that doesn't have a right/wrong answer. Technically, as you point out, the MLPE model is using the effective distance only, so k=2. However, this obscures the optimization process of the parameters being estimated: scale and max in continuous transformations, category resistance values for categorical surfaces. The more levels and/or layers being optimized, the more 'dials' we have to adjust pairwise resistance values. What we are trying to do with AIC is find the balance between model complexity and information gained. If you are receiving marginal improvements with much more model complexity, we have to question if the complexity is really necessary.

You can adjust how k is calculated using the k.value parameter in the GA.prep function.

Finally, I would caution about over-interpreting the marginal R^2 of the MLPE model. See Beninde et al. for an example where R^2 was not a reliable metric.

Beninde, J., Wittische, J., & Frantz, A. C. (2023). Quantifying uncertainty in inferences of landscape genetic resistance due to choice of individual-based genetic distance metric. Molecular Ecology Resources, 00, 1–18. https://doi.org/10.1111/1755-0998.13831

hannehaug commented 1 year ago

Thank you so much! This was helpful.