Closed Zeroo11 closed 3 months ago
Hi Zeroo, we have an equivalent function in Machuruku, machu.top.env, that works for multiple species. Try that one out instead!
Thank you, I have tried.
machu.top.env(occ.rarefied, ClimCur, method = "contrib", contrib.greater = 10)
, I get an error message:
**Error in var(threshold.stats, use = "complete.obs") :
no complete element pairs**
In addition: There were 32 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: glm.fit: fitted probabilities numerically 0 or 1 occurred
2: glm.fit: algorithm did not converge
3: glm.fit: fitted probabilities numerically 0 or 1 occurred
4: glm.fit: fitted probabilities numerically 0 or 1 occurred
5: glm.fit: fitted probabilities numerically 0 or 1 occurred
6: glm.fit: fitted probabilities numerically 0 or 1 occurred
7: glm.fit: fitted probabilities numerically 0 or 1 occurred
8: glm.fit: fitted probabilities numerically 0 or 1 occurred
9: glm.fit: fitted probabilities numerically 0 or 1 occurred
10: glm.fit: fitted probabilities numerically 0 or 1 occurred
11: glm.fit: fitted probabilities numerically 0 or 1 occurred
12: glm.fit: algorithm did not converge
13: glm.fit: fitted probabilities numerically 0 or 1 occurred
14: glm.fit: fitted probabilities numerically 0 or 1 occurred
15: glm.fit: fitted probabilities numerically 0 or 1 occurred
16: glm.fit: fitted probabilities numerically 0 or 1 occurred
17: glm.fit: fitted probabilities numerically 0 or 1 occurred
18: glm.fit: fitted probabilities numerically 0 or 1 occurred
19: In cor(y_i, u_i) : the standard deviation is zero
20: In cor(y_i, u_i) : the standard deviation is zero
21: In cor(y_i, u_i) : the standard deviation is zero
22: In cor(y_i, u_i) : the standard deviation is zero
23: glm.fit: algorithm did not converge
24: glm.fit: fitted probabilities numerically 0 or 1 occurred
25: glm.fit: fitted probabilities numerically 0 or 1 occurred
26: In cor(y_i, u_i) : the standard deviation is zero
27: In cor(y_i, u_i) : the standard deviation is zero
28: In cor(y_i, u_i) : the standard deviation is zero
29: In cor(y_i, u_i) : the standard deviation is zero
30: In cor(y_i, u_i) : the standard deviation is zero
31: In cor(y_i, u_i) : the standard deviation is zero
32: In cor(y_i, u_i) : the standard deviation is zero
Howerver, when I use unreduced data, although there is a warning message, I can get the top environment variable result:
[1] "Important variables:"
[1] "bio_1" "bio_12" "bio_13" "bio_14" "bio_15" "bio_18" "bio_19"
**There were 35 warnings (use warnings() to see them)**
> machu.top.env(occ, ClimCur, method = "nvars", nvars.save = 6)
**Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': could not find function "bind_rows"
In addition: There were 37 warnings (use warnings() to see them)**
The first error occurs with both methods ("contrib" and "nvars").
**Error in var(threshold.stats, use = "complete.obs") :
no complete element pairs**
In addition: There were 12 warnings (use warnings() to see them)
Is it because of NA problem? I extract the climate data for each distribution point and checked that there are no NA.
> all.point<-occ[,-1]
> coordinates(all.point) <- ~long_DD+lat_DD
> all.point.clim<-extract(ClimCur,all.point)
> class(all.point.clim)
[1] "matrix" "array"
> anyNA(all.point.clim)
[1] FALSE
Do you know the reason for these two errors?
The issue with the example data might be because one of the species has too few occurrence points after rarefication. The second problem is probably because the package dplyr is not loaded --- not importing that was my mistake, if you do library(dplyr)
it should fix it.
As for your own data, I can't really say what the problem is without the data to look at myself. I also revamped this function for Machuruku 2, you can try that version and see if it fixes the issue. Do make sure that you've specified the columns in your occurrence data properly. The function assumes species is in column 1, longitude in col 2, and latitude in col 3, unless you specify otherwise.
Updated machu.top.env function:
machu.top.env <- function(occ, clim, sp.col = 1, long.col = 2, lat.col = 3, learning.rt = 0.01, pa.ratio = 4, steps = 50, method = "contrib", nvars.save = 5, contrib.greater = 5, verbose = T) # method=='estimate' method=='contrib' method=='nvars'
{
# input adjustments
if (method=="ESTIMATE" | method=="Estimate") method <- "estimate"
if (method=="CONTRIB" | method=="Contrib" | method=="contribution") method <- "contrib"
if (method=="Nvars" | method=="NVARS") method <- "nvars"
# this is necessary bc the gbm.step function works by setting "silent" rather than "verbose", which are opposites
if (verbose == T) verbose <- F else if (verbose == F) verbose <- T
# determine indices (i.e., which columns, corresponding to no. climate variables) of predictor vars
e.var <- 4:(3+nlayers(clim))
# identify max number of occurrence points across all species
n.occ.max <- max(table(occ$sp))
# randomly sample n.occ.max*30 points from the raster climate data: these will be the pool from which pseudoabsences will be drawn
# however if the data is low resolution there will likely be too few cells to sample w/o replacement
# check if this is the case, and if so, change n.occ.max to n.cells
n.cells <- nrow(rasterToPoints(clim[[1]], fun=NULL, spatial=FALSE)) # doesn't count ocean cells
if (n.occ.max*30 > n.cells) n.occ.max <- n.cells
if (verbose == F) { print("Extracting climate values from rasters. This could take a minute.")}
pa.pool <- sampleRandom(clim, n.occ.max, xy=T)
if (verbose == F) { print("Finished extracting climate values.")}
ID <- rep(0, nrow(pa.pool))
pa.pool <- cbind(ID, pa.pool)
# initialize output list
imp.vars <- list()
# loop over each taxon
for (i in 1:length(unique(occ$sp))){
# get species name
sp.name <- unique(occ$sp)[i]
# get number of points for that taxon
n.occ <- nrow(subset(occ, sp==sp.name))
# randomly sample n.occ*4 pseudoabsence (pa) points from pa.pool (4 is changeable with pa.ratio)
# there have to be at least 50 points for the gbm model to work
# if n.occ*4 > n.cells, set n.pa = n.cells
# if n.occ > n.cells, set n.pa
# perform various other checks and corrections
n.pa <- n.occ*pa.ratio
if (n.occ+(n.occ*pa.ratio) < 50) n.pa <- 50-n.occ
if (n.occ*pa.ratio > n.cells) n.pa <- n.cells
# sample pseudoabsences
sp.pa <- pa.pool[sample(nrow(pa.pool), n.pa),]
# get occurrence points for that species
sp.occ <- subset(occ, sp==sp.name)[,c(long.col, lat.col)]
colnames(sp.occ) <- c("x", "y")
ID <- rep(1, nrow(sp.occ))
# extract climate values for actual occurrence points and put it all in a table
sp.data <- data.frame(ID, sp.occ, extract(clim, sp.occ))
# combine pseudoabsence and species occurrence tables
datamodel <- rbind(sp.pa, sp.data)
if (method == "nvars" || method == "contrib") {
modelOut <- gbm.step(data = datamodel, gbm.x = e.var, gbm.y = 1, family = "bernoulli",
tree.complexity = 5, learning.rate = learning.rt, step.size = steps, bag.fraction = 0.5,
plot.main = FALSE, plot.folds = FALSE, silent = verbose)
}
if (method == "estimate") {
modelOut1 <<- gbm.step(data = datamodel, gbm.x = e.var, gbm.y = 1, family = "bernoulli",
tree.complexity = 5, learning.rate = learning.rt, step.size = steps, bag.fraction = 0.5,
plot.main = FALSE, plot.folds = FALSE, silent = verbose)
mod.simp <<- gbm.simplify(modelOut1)
min.y <- min(c(0, mod.simp$deviance.summary$mean))
n.vars1 <- match(min.y, c(0, mod.simp$deviance.summary$mean)) - 1
vars1.use <- mod.simp$pred.list[[n.vars1]]
names(env1)[vars1.use]
modelOut <- gbm.step(data = datamodel, gbm.x = mod.simp$pred.list[[n.vars1]], gbm.y = 1,
family = "bernoulli", tree.complexity = 5, learning.rate = learning.rt, step.size = steps,
bag.fraction = 0.5, plot.main = FALSE, plot.folds = FALSE, silent = verbose)
}
if (method == "nvars") {
varInc <- as.vector(modelOut$contributions[1:nvars.save, ])
}
if (method == "estimate") {
varInc <- as.vector(modelOut$contributions[, 1])
}
if (method == "contrib") {
varInc <- as.vector(modelOut$contributions$var[modelOut$contributions$rel.inf > contrib.greater])
}
imp.vars[[i]] <- varInc
}
# sort the combined variable importances and take the top nvars.save across all taxa
if (method == "nvars") {
h <- as.data.frame(bind_rows(imp.vars))
k <- as.vector(unique(h$var))
for (i in k){
j<-h[h$var==i,]
if (match(i, k) == 1){
l <- j[j$rel.inf==max(j$rel.inf),]
} else {
l <- rbind(l, j[j$rel.inf==max(j$rel.inf),])
}
}
l <- l[order(-l$rel.inf),]
imp.var.out <- as.vector(l[1:nvars.save,][,1])
}
if (method == "contrib" || method == "estimate"){
imp.var.out <- sort(unique(unlist(imp.vars)))
}
print("Important variables:")
print(imp.var.out)
return(imp.var.out)
}
Also, as an alternative, you can try the removeCollinearity function from the virtualspecies package, which does pretty much the same thing in a different way.
Hello, Professor. Sorry to bother you again. QAQ I see your article (A New Method for Integrating Ecological Niche Modeling with Phylogenetics to Estimate Ancestral Distributions) says "The variable reduction is performed by generating ENMs using boostedregression trees (Elith et al 2019) as implemented in the function “humboldt.top.env” from the R package humboldt (Brown and Carnaval 2019)". The “humboldt.top.env” function requires parameters env1, env2, sp1 and sp2, nly between two species? However, you used four species in your article, how to do the variable reduction by “humboldt.top.env” function?
Looking forward to your reply。