sylvainschmitt / SSDM

Stacked Species Distribution Modelling R package
Other
41 stars 17 forks source link

computation of the variable importance #33

Closed BoiChaza closed 6 years ago

BoiChaza commented 6 years ago

Dear Sylvain Was wondering if you could please help explain how the SSDM package calculates the variable importance for the present time and how the variable importance is calculated when projecting the models to the future.

does the package follow the principle of shuffling a variable of the given data then make model prediction with this shuffled data set and computes a correlation between reference predictions and the shuffled one? i have been asked to fully explain how the package computes variable importance by reviewers , and therefore want to be quite sure of how exactly the Variable importance is calculated especially when projecting to the future.

Thank you in advance Boi

sylvainschmitt commented 6 years ago

Dear @BoiChaza ,

Have a look to section 2.3.2 Importance analysis of environmental variables in our article of 2017 (https://besjournals.onlinelibrary.wiley.com/doi/epdf/10.1111/2041-210X.12841) :

The “ssdm” package provides two measures of the relative contribution of environmental variables on a species- by- species basis, which quantifies the relevance of an environmental variable to determine species distribution. The first measure is based on a jackknife approach that evaluates the change in accuracy between a full model and a model in which each environmental variable is omitted in turn (Phillips, Anderson, & Schapire, 2006). All metrics available in the package can be used to assess the change in accuracy. The second measure is based on Pearson’s correlation coefficient between a full model and a model with each environmental variable omitted in turn (Thuiller, Lafourcade, Engler, & Araújo, 2009). These measures, which are calculated on a species- by- species basis, are averaged in SSDM.

If you plan to use SSDM in a publication please consider citing the corresponding article:

Schmitt, S., Pouteau, R., Justeau, D., de Boissieu, F., & Birnbaum, P. (2017). ssdm: An r package to predict distribution of species richness and composition based on stacked species distribution models. Methods in Ecology and Evolution, 8(12), 1795-1803.

Hoping it helped you, Best

Sylvain

sylvainschmitt commented 6 years ago

And sorry I just saw you were asking about model forecasting in the future. It does not change the variable importance analysis because variable importance is computed on the training dataset. And the training dataset is the present observations and environmental variables, both when doing present species distribution models or future species distribution models.

BoiChaza commented 6 years ago

thank you very much for your response.

Boi

On 8 August 2018 at 15:33, Sylvain SCHMITT notifications@github.com wrote:

And sorry I just saw you were asking about model forecasting in the future. It does not change the variable importance analysis because variable importance is computed on the training dataset. And the training dataset is the present observations and environmental variables, both when doing present species distribution models or future species distribution models.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sylvainschmitt/SSDM/issues/33#issuecomment-411427537, or mute the thread https://github.com/notifications/unsubscribe-auth/AkHTBfcFFD30O_JaqjpTiKhYNLVdJWaqks5uOvaygaJpZM4VyhO7 .

-- Kind regards

Boipelo Tshwene-Mauchaza

Tel: +267 76475359 or +267 76611166

"Trust in the Lord with all thine heart; and lean not unto thine own understanding.In all thy ways acknowledge him,and he shall direct thy paths.Be not wise in thine own eyes:fear the LORD, and depart from evil"

asierrl commented 5 years ago

Hi Sylvain. I was trying to figure out the exact method by which variable importance was calculated and I bumped into an apparent contradiction. In your paper you say that "The second measure is based on Pearson’s correlation coefficient between a full model and a model with each environmental variable omitted in turn (Thuiller, Lafourcade, Engler, & Araújo, 2009)", which is in fact what your code seems to me to be doing. However, Thuiller et al. 2009 say that "This procedure uses Pearson correlation between the standard predictions (i.e. fitted values) and predictions where the variable under investigation has been randomly permutated." Is there something I am missing? or is just an error in the citation? Best regards...and thanks again for your excellent piece of software. Asier