hmsc-r / HMSC

GNU General Public License v3.0
102 stars 37 forks source link

error for predict along a gradient with NNGP #96

Open FabianRoger opened 3 years ago

FabianRoger commented 3 years ago

I fitted a model with a spatial random factor with the 'NNGP' method. Now I tried to predict with this model along a gradient.

My code is

 Gradient = constructGradient(models$m1, focalVariable = "Granskog")
 predY = predict(models$m1, Gradient = Gradient, expected = TRUE)

This gives me the error

    Error in get.knnx(data, query, k, algorithm) : 
    Number of columns must be same!.

Googling the error message brings me here but I am not sure if this is what's going on internally. If necessary, I can try to provide a repex later.

Thanks!

jarioksa commented 3 years ago

I can reproduce this in an NNGP model. I'll have a look at this.

jarioksa commented 3 years ago

This is now fixed in commit ee1367afd3d58c386c36f6cf9d1eb77d72d7cb65. Please try the githbut version of Hmsc to see if it cures your problem (sloppy coding).

FabianRoger commented 3 years ago

thanks!

I tried it with the updated version and the original error seems to be fixed. However I now get and error saying

Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 't': the leading minor of order 1 is not positive definite

also, weirdly, when testing the same code on another model that is identical to the model m1 in all but that it uses abundance data conditional on presence (the second part of the hurdle model) the original error still occurs.

jarioksa commented 3 years ago

This error does not come directly from Hmsc, but from some other package or R function that we use. After getting the error, write traceback() and post the output.

FabianRoger commented 3 years ago

this is what I get

 traceback()
 9: h(simpleError(msg, call))
 8: .handleSimpleError(function (cond) 
   .Internal(C_tryCatchHelper(addr, 1L, cond)), "the leading minor of order 1 is not positive definite", 
       base::quote(chol.default(W)))
7: chol.default(W)
6: chol(W)
5: t(chol(W))
4: predictLatentFactor(unitsPred = levels(dfPiNew[, r]), units = levels(object$dfPi[, 
       r]), postEta = postEta, postAlpha = postAlpha, rL = rL[[r]], 
       predictMean = predictEtaMean, predictMeanField = predictEtaMeanField)
3: predict.Hmsc(models$m1, Gradient = Gradient, expected = TRUE)
2: predict(models$m1, Gradient = Gradient, expected = TRUE)
1: predict(models$m1, Gradient = Gradient, expected = TRUE)
ovaskain commented 3 years ago

This error message could appear if you have (almost) identical spatial coordinates – check if this is the case.

Otso

From: FabianRoger @.> Sent: torstai 29. huhtikuuta 2021 23:53 To: hmsc-r/HMSC @.> Cc: Subscribed @.***> Subject: Re: [hmsc-r/HMSC] error for predict along a gradient with NNGP (#96)

this is what I get

traceback()

9: h(simpleError(msg, call))

8: .handleSimpleError(function (cond)

.Internal(C_tryCatchHelper(addr, 1L, cond)), "the leading minor of order 1 is not positive definite",

   base::quote(chol.default(W)))

7: chol.default(W)

6: chol(W)

5: t(chol(W))

4: predictLatentFactor(unitsPred = levels(dfPiNew[, r]), units = levels(object$dfPi[,

   r]), postEta = postEta, postAlpha = postAlpha, rL = rL[[r]],

   predictMean = predictEtaMean, predictMeanField = predictEtaMeanField)

3: predict.Hmsc(models$m1, Gradient = Gradient, expected = TRUE)

2: predict(models$m1, Gradient = Gradient, expected = TRUE)

1: predict(models$m1, Gradient = Gradient, expected = TRUE)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/hmsc-r/HMSC/issues/96#issuecomment-829588364, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEIYMZQJXVIC2FV46YM754TTLHBK5ANCNFSM43ZTTQHA.

jarioksa commented 3 years ago

What bugs me in this error is that you should never execute command t(chol(W)): it is executed only in full spatial models, but never in NNGP models. Now I really need a reproducible example to see how can you get into this line in NNGP models.

jarioksa commented 3 years ago

For anybody trying to inspect the issue, here is how I generated the reproducible test case:

m <- TD$m
rL1 <- TD$rL1
rL1 <- update(rL1, sMethod="NNGP", nNeighbours=4)
m <- update(m, ranLevels = list(sample = TD$rL2, plot = rL1))
m <- sampleMcmc(m, samples=100, thin=1, transient=50, nChains=2, nParallel=2)
Gradient <- constructGradient(m, focalVariable = "x2")
predY <- predict(m, Gradient = Gradient, expected = TRUE) # fails before ee1367a

Can you use something similar to reproduce your problem?

FabianRoger commented 3 years ago

re distance between points:

It shouldn't be, no. The coordinates are centroids and about 20 km apart form each other. I checked and the minimum distance between any two points is 26km.

FabianRoger commented 3 years ago

weirdly enough the repex above runs without error for me.

fkeppeler commented 2 years ago

Are there any updates about this issue? I'm having similar problems when using both GPP and NNGP models.

traceback() 8: h(simpleError(msg, call)) 7: .handleSimpleError(function (cond) .Internal(C_tryCatchHelper(addr, 1L, cond)), "the leading minor of order 1 is not positive definite", base::quote(chol.default(W))) 6: chol.default(W) 5: chol(W) 4: t(chol(W)) 3: predictLatentFactor(unitsPred = levels(dfPiNew[, r]), units = levels(object$dfPi[, r]), postEta = postEta, postAlpha = postAlpha, rL = rL[[r]], predictMean = predictEtaMean, predictMeanField = predictEtaMeanField) 2: predict.Hmsc(m2, Gradient = Gradient) 1: predict(m2, Gradient = Gradient)

cgoetsch commented 2 years ago

I am also getting this error for a non-spatial model, although I have a temporal effect of year. I am just trying to create the gradient plots by covariate, not make spatial predictions - those are working fine.

studyDesign = data.frame(
  tow = as.factor(da$UniqueID),
  grid_id = as.factor(da$grid_id),
  year = as.factor(da$YEAR))

#set up the random effect of sample level to get species co-occurrence at the sample level
rL.tow = HmscRandomLevel(units = studyDesign$tow)
#set up the temporal random effect - temporally explicit for temporal autocorrelation
rL.time = HmscRandomLevel(sData=time_da) 

m = Hmsc(Y = Y, XData = X, XFormula = XFormula.1.d, distr="probit", studyDesign=studyDesign, ranLevels=list("tow"= rL.tow, "year" = rL.time))

Gradient_cov = constructGradient(m, focalVariable = "rugosity") predY_cov = predict(m, Gradient=Gradient_cov, expected = TRUE)

traceback() 9: h(simpleError(msg, call)) 8: .handleSimpleError(function (cond) .Internal(C_tryCatchHelper(addr, 1L, cond)), "the leading minor of order 1 is not positive definite", base::quote(chol.default(W))) 7: chol.default(W) 6: chol(W) 5: t(chol(W)) 4: predictLatentFactor(unitsPred = levels(dfPiNew[, r]), units = levels(object$dfPi[, r]), postEta = postEta, postAlpha = postAlpha, rL = rL[[r]], predictMean = predictEtaMean, predictMeanField = predictEtaMeanField) 3: predict.Hmsc(m, Gradient = Gradient_cov, expected = TRUE) 2: predict(m, Gradient = Gradient_cov, expected = TRUE) 1: predict(m, Gradient = Gradient_cov, expected = TRUE)

Yo-B commented 2 years ago

I have been having the same error with my temporal random effect. I cannot make prediction if I define it with argument sData when training the model. I also have a spatial random effect that works fine with NNGP or GPP, and removing it does not help in passing my temporal factor as sData. If I pass the temporal as unit, so not as a longitudinal factor, it works, but I'm surely loosing some information by doing this. I attach below a minimal reproducible example that triggers the same error. It is quite big, yet minimal enough I hope. Any suggestion on how I should define my longitudinal random effects here? BTW thanks guys for your awesome work!

library("dplyr")
library("Hmsc")

## data dimensions
n_cov <- 10
n_sample <- 100
n_species <- 20
n_trait <- 5

## generate data set with species occurrence, covariates, spatiotemporal coordinates and traits
XData <- data.frame(matrix(rnorm(n_sample*n_cov), ncol=n_cov))
colnames(XData) <- c("x", "y", paste0("var_", 1:(n_cov-2)))
XData$year <- sample(2000:2010, n_sample, replace=TRUE)

YData <- data.frame(matrix(rnorm(n_sample*n_species), ncol=n_species)) %>% 
  mutate_all(function(x) (x>.75)*1)
colnames(YData) <- paste0("sp_", 1:n_species)
TrData <- data.frame(matrix(rnorm(n_trait*n_species), ncol=n_trait))
colnames(TrData) <- paste0("tr_", 1:n_trait)
rownames(TrData) <- paste0("sp_", 1:n_species)

## subset the data set in train/test sets
train <- sample(n_sample, n_sample*.6)
designTrain <- XData[train, c("year", "x", "y")]
xTrain <- XData[train, paste0("var_", 1:(n_cov-2))]
yTrain <- YData[train, ]

## study design
studyDesign <- data.frame(year=factor(designTrain$year),
                          xy=factor(paste(designTrain$x, designTrain$y, sep="_")))
rownames(studyDesign) <- rownames(yTrain) <- rownames(xTrain) <- 1:nrow(yTrain)

## formulas
f_cov <- as.formula(paste0("~", paste(paste0("var_", 1:(n_cov-2)), collapse = "+")))
f_tra <- as.formula(paste0("~", paste(colnames(TrData), collapse = "+")))

## prepare temporal random effect
time <- data.frame(year=sort(unique(designTrain$year)))
rownames(time) <- time$year
## prepare spatial random effect
space <- data.frame(unique(designTrain[, c("x", "y")]))
rownames(space) <- paste(space$x, space$y, sep="_")
xyknots <- constructKnots(sData=space, knotDist = 1)
plot(space, asp=1) ; points(xyknots, col="red")
## define random effects
rL3 <- HmscRandomLevel(sData = time)
rL4 <- HmscRandomLevel(sData = space, sMethod = "GPP", sKnot = xyknots)

## Define the model
(m1 <- Hmsc(Y = yTrain,
            XData = xTrain, 
            studyDesign = studyDesign,
            TrData = TrData,
            TrFormula =  f_tra, 
            XFormula = f_cov,
            ranLevels = list(year=rL3, xy=rL4),
            distr = "probit"))

## train the model
m1 <- sampleMcmc(m1, thin=2, samples=100, nParallel=3, transient=10, nChains=3, verbose=10)

## predict on test and train data sets
designTest <- XData[!(1:n_sample %in% train), c("year", "x", "y")]
xTest <- XData[!(1:n_sample %in% train), paste0("var_", 1:(n_cov-2))]
yTest <- YData[!(1:n_sample %in% train), ]

## Try to make predictions
Gradient <- prepareGradient(m1, XDataNew = data.frame(rbind(xTest, xTrain)), 
                            sDataNew = list(
                              year=data.frame(year=rbind(designTest, designTrain)$year), ## commenting that line and removing rL3 from model definition above will allow predictions
                              xy=data.frame(rbind(designTest, designTrain)[, c('x','y')])))
pred_occ <- predict(m1, Gradient = Gradient) #### ERROR will occur here
jarioksa commented 2 years ago

@Yo-B This is one of the issues I have been studying recently: thank you for supplying a reproducible example.

I have an experimental, dirty and noisy fix for this discussed in issue #123. You can try this installing the experimental branch with command devtools::install_github("hmsc-r/HMSC", ref="updater-errors"). It is experimental because I have not thoroughly tested it, it is dirty because I have not thoroughly thought over it, and it is noisy because it makes a whole lot of noise: it prints you a large amount of error messages which are ignored, and you will get a result despite these messages.

The reason of failure in several of these cases seems to be that Gradient values duplicate data and therefore you have zero-distances between Gradient points and data points (training points). This in turn gives you zero W matrix – and a matrix of zeros is not positive definite. The purpose of this calculation is to get the value of standard error for random normal variate, and that standard error is distance-dependent, and zero when distance is zero.

The risk of duplicating data points is particularly large in time random variables which – like in your case – are integer years and the generated Gradient values will duplicate data values.

This fix may also work with the cases of @cgoetsch, @fkeppeler and @FabianRoger but I haven't tested with their data. Remember: you should get a lot of error messages, but these will not stop the analysis, but you will get a result.