Open elgabbas opened 6 months ago
Hi Ahmed,
I'm also interested in this question because of the amount of time it takes to make spatial predictions. I was under the impression that predictEtaMean = TRUE
would speed up prediction as well, so I was also surprised when it slowed it down instead. Have you learned anything since posting this?
Thanks, Clara
Hello Clara,
I am interested in fitting large-scale models using GPP but have also fitted smaller-scale models for exploration.
In relatively small-scale models, using predictEtaMean = TRUE
increased the prediction speed in my case by 3 times.
However, this did not help much when using large-scale models when I predict to new sites [see
here]. When predictEtaMean = TRUE
, the predictlatentfactor function calculates the distances between all sampling units and between sampling units and new sites and do further analyses on them. These two matrices can be very large in large-scale studies and this is probably the reason for the problem in my case. Using predictEtaMean = FALSE
did not help in my case.
Ahmed
Hi Ahmed,
Just following up on this with a tip: I've found that if you feed ChatGPT the relevant lines of code from predictLatentFactor and tell it what the variables mean, it will tell you what the spatial prediction algorithm is doing. For instance, I gave it the following prompt.
Help me understand what is happening in this spatial prediction algorithm. All I know is that dss is a distance matrix between spatial knots, dns is a distance matrix between prediction sites (rows) and spatial knots (columns), dnsOld is a distance matrix between training sites (rows) and spatial knots (columns), alphapw[ag,1] is the factor that controls the rate of spatial decay, and NROW(sKnot) is the number of spatial knots.
Wns = exp(-dns/alphapw[ag,1])
W12 = exp(-dnsOld/alphapw[ag,1])
Wss = exp(-dss/alphapw[ag,1])
iWss= solve(Wss)
WnsiWss = Wns%*%iWss
dDn = 1 - rowSums(WnsiWss*Wns)
D = W12%*%iWss%*%t(W12)
dD = 1-diag(D)
idD = 1/dD
tmp0 = matrix(rep(idD,NROW(sKnot)),ncol=NROW(sKnot))
idDW12 = tmp0*W12
FMat = Wss + t(W12)%*%idDW12
iF= solve(FMat)
LiF = chol(iF)
muS1 = iF%*%t(idDW12)%*%eta[,h]
epsS1 = LiF%*%rnorm(nrow(Wss))
m = Wns%*%(muS1+epsS1)
etaPred[indNew,h] = as.numeric(m + sqrt(dDn)*rnorm(nn))
ChatGPT correctly inferred that eta[,h]
are the latent effects observed at training sites, and I have no reason to disbelieve its other output. What I appreciated about it is that it walked me through the interpretation of each line of code. It's a little cumbersome to paste in full here, so I'll refrain from doing that for now.
I asked it to compare the algorithms for predictEtaMean
and predictEtaMeanField
, and it gave me a long response that ended with the following, referring to predictEtaMeanField
:
In summary, this algorithm not only predicts the expected value (m) but also models the spatially varying uncertainty in predictions, providing a more realistic sense of confidence around each predicted value.
Indeed, that is consistent with what I found when I plotted the outputs of either algorithm (see attached). eta_vs_etaMean.pdf eta_vs_etaMeanField.pdf
Of course, this doesn't solve your computational efficiency problem, but I hope it provides some insight into what's happening under the hood, and points to a way to learn more.
Best, Clara
PS – My GitHub email notifications seem not to be working, so apologies if I miss your response.
Hello,
I would like to make spatial predictions for an Hmsc model with a spatial random effect. I have two questions here:
predict
function is very slow. In a toy model with only c.a. 5000 sampling sites and 22 species, thepredict
function took 25 minutes to run. The final model will be more complex: c.a. 50K sampling sites and many more species.I tried using
nParallel = nChains
argument but this did not improve much the processing speed. I tried to increase the speed by making small changes to thepredict.Hmsc
andpredictLatentFactor
functions to process both functions in parallel. There is some speed improvement, but ifpredictEtaMean = TRUE
orpredictEtaMeanField = TRUE
, no improvement was found.Is it possible to increase the speed of model predictions?
Could you please explain, in less technical terms, what is the difference between the values of the previous three predictions and which should be used? Is there an ecological difference in their interpretation? Using
predictEtaMean = TRUE
orpredictEtaMeanField = TRUE
makes the prediction take much time.Thanks, Ahmed