Closed mrkdfb closed 4 years ago
I stumbled to this very same error message in another context, and committed a change that fixed this issue (but revealed another). You may try with the latest github version to see if this helps in your case. See Hmsc repository front page for instruction for install_github
.
Hi Jari,
many thanks for your reply. I tried again after having installed the last github version and got a different error:
m.spatial = Hmsc(Y = as.matrix(Y),
XData = as.data.frame(XDATA),
XFormula = XFormula,
TrData = TRAITS,
TrFormula = TrFormula,
studyDesign = studyDesign,
ranLevels = list("sample" = rL.spatial),
distr="probit")
m.spatial = sampleMcmc(m.spatial, thin = thin,
samples = samples, transient = transient,
nChains = nChains, verbose = verbose, nParallel=1,
updater=list(GammaEta=FALSE))
partition = createPartition(m.spatial, nfolds = 4, column = "sample")
m.spatial.predsREAL.sp = computePredictedValues(m.spatial, partition=partition, partition.sp=1:ncol(Y), updater=list(GammaEta=FALSE), nParallel=nChains, expected=F)
Cross-validation, fold 1 out of 4
Errore: Matrices must have same dimensions in iWs + tmp1
> traceback()
8: stop(gettextf("Matrices must have same dimensions in %s", deparse(sys.call(sys.parent()))),
call. = FALSE, domain = NA)
7: dimCheck(e1, e2)
6: iWs + tmp1
5: iWs + tmp1
4: updateEta(Y = Yc, Z = Z, Beta = sam$Beta, iSigma = 1/sam$sigma,
Eta = Eta, Lambda = sam$Lambda, Alpha = sam$Alpha, rLPar = rLPar,
X = X, Pi = PiNew, dfPi = dfPiNew, rL = rL)
3: predict.Hmsc(hM1, post = postList, X = XVal, XRRR = XRRRVal,
studyDesign = dfPi, Yc = Yc, mcmcStep = mcmcStep, expected = expected)
2: predict(hM1, post = postList, X = XVal, XRRR = XRRRVal, studyDesign = dfPi,
Yc = Yc, mcmcStep = mcmcStep, expected = expected)
1: computePredictedValues(m.spatial, partition = partition, partition.sp = 1:ncol(Y),
updater = list(GammaEta = FALSE), nParallel = nChains, expected = F)
Waiting for your feedback Mirko
Yes, that is the same error that I got with my case after fixing the first problem. I know how this happens and what is the problem. However, I don't know yet how to fix this problem. I'm studying the issue.
I confirm that this is a bug. A reproducible example is:
library(Hmsc)
set.seed(1)
partition <- createPartition(TD$m, nfolds = 2)
predsCV2 <- computePredictedValues(TD$m, partition = partition,
partition.sp = 1:TD$m$ns, mcmcStep = 100)
This is based on (dontrun
) example of computePredictedValues
. This will fail, if some of the random levels are missing in a partition (with set.seed(1)
plot
6 is missing in partition 1), and you request a model where Eta (H) parameters are needed in a spatial model. The reason for the error is that the spatial parameters were pre-computed with all random levels, but Eta are estimated only for the levels in the partition, and this gives the mismatch of dimensions (or the error reported in the first message in this issue if you use older version of Hmsc, for instance the 3.0-4 release).
Many thanks Jari. I hope this will be fixed in a next release. Best, Mirko
@mrkdfb : @gtikhonov made a commit that should fix your issue. Please test.
Hi Jari,
now the computePredictedValues function works properly. That said, I got a new error that never happened before. If I construct a gradient like this:
GGradient = constructGradient(m.spatial, focalVariable = "bio_15", ngrid=20)
GGradient $XDataNew bio_15 bio_19 bio_3 bio_9 distance_Barren distance_Farmland distance_Forest distance_Grassland distance_Urban distance_Water 1 20.74979 158.5410 281.8509 204.9735 21888.21 39.20055 4888.066 1988.633 12943.547 23109.66 2 22.45802 165.5720 281.7848 206.9182 22176.21 79.32761 4734.784 1977.796 12425.608 22795.22 3 24.16625 172.6031 281.7187 208.8628 22464.21 119.45467 4581.502 1966.959 11907.670 22480.78 4 25.87448 179.6342 281.6526 210.8075 22752.20 159.58172 4428.219 1956.122 11389.731 22166.34 5 27.58270 186.6653 281.5864 212.7521 23040.20 199.70878 4274.937 1945.286 10871.793 21851.90 6 29.29093 193.6964 281.5203 214.6967 23328.20 239.83584 4121.655 1934.449 10353.854 21537.46 7 30.99916 200.7275 281.4542 216.6414 23616.20 279.96290 3968.373 1923.612 9835.916 21223.02 8 32.70738 207.7586 281.3881 218.5860 23904.19 320.08995 3815.091 1912.775 9317.977 20908.58 9 34.41561 214.7896 281.3220 220.5307 24192.19 360.21701 3661.808 1901.939 8800.038 20594.14 10 36.12384 221.8207 281.2559 222.4753 24480.19 400.34407 3508.526 1891.102 8282.100 20279.70 11 37.83207 228.8518 281.1898 224.4199 24768.19 440.47113 3355.244 1880.265 7764.161 19965.26 12 39.54029 235.8829 281.1237 226.3646 25056.18 480.59818 3201.962 1869.428 7246.223 19650.82 13 41.24852 242.9140 281.0576 228.3092 25344.18 520.72524 3048.680 1858.592 6728.284 19336.38 14 42.95675 249.9451 280.9914 230.2539 25632.18 560.85230 2895.397 1847.755 6210.345 19021.94 15 44.66498 256.9762 280.9253 232.1985 25920.17 600.97936 2742.115 1836.918 5692.407 18707.51 16 46.37320 264.0072 280.8592 234.1432 26208.17 641.10641 2588.833 1826.082 5174.468 18393.07 17 48.08143 271.0383 280.7931 236.0878 26496.17 681.23347 2435.551 1815.245 4656.530 18078.63 18 49.78966 278.0694 280.7270 238.0324 26784.17 721.36053 2282.269 1804.408 4138.591 17764.19 19 51.49788 285.1005 280.6609 239.9771 27072.16 761.48759 2128.986 1793.571 3620.652 17449.75 20 53.20611 292.1316 280.5948 241.9217 27360.16 801.61464 1975.704 1782.735 3102.714 17135.31
$studyDesignNew sample 1 new_unit 2 new_unit 3 new_unit 4 new_unit 5 new_unit 6 new_unit 7 new_unit 8 new_unit 9 new_unit 10 new_unit 11 new_unit 12 new_unit 13 new_unit 14 new_unit 15 new_unit 16 new_unit 17 new_unit 18 new_unit 19 new_unit 20 new_unit
$rLNew $rLNew$sample Hmsc random level object with 1068 units. Spatial dimensionality is 2 and number of covariates is 0.
and then launch the predict function on it, I get the following error:
predG=predict(m.spatial, Gradient=GGradient) Error in get.knnx(data, query, k, algorithm) : Number of columns must be same!.
here is the traceback:
7: stop("Number of columns must be same!.") 6: get.knnx(data, query, k, algorithm) 5: knnx.index(sOld, sNew, k = rL$nNeighbours) 4: predictLatentFactor(unitsPred = levels(dfPiNew[, r]), units = levels(object$dfPi[, r]), postEta = postEta, postAlpha = postAlpha, rL = rL[[r]], predictMean = predictEtaMean, predictMeanField = predictEtaMeanField) 3: predict.Hmsc(m.spatial, Gradient = GGradient) 2: predict(m.spatial, Gradient = GGradient) 1: predict(m.spatial, Gradient = GGradient)
I cannot reproduce this. Can you provide a reproducible example?
Here it follows (it is based on the vignette 4 example):
library(Hmsc)
library(MASS)
n = 100
ns = 5
beta1 = c(-2,-1,0,1,2)
alpha = rep(0,ns)
beta = cbind(alpha,beta1)
x = cbind(rep(1,n),rnorm(n))
Lf = x%*%t(beta)
xycoords = matrix(runif(2*n),ncol=2)
colnames(xycoords) = c("x-coordinate","y-coordinate")
rownames(xycoords) = 1:n
sigma.spatial = c(2)
alpha.spatial = c(0.35)
Sigma = sigma.spatial^2*exp(-as.matrix(dist(xycoords))/alpha.spatial)
eta1 = mvrnorm(mu=rep(0,n), Sigma=Sigma)
lambda1 = c(1,2,-2,-1,0)
Lr = eta1%*%t(lambda1)
L = Lf + Lr
y = as.matrix(L + matrix(rnorm(n*ns),ncol=ns))
yprob = 1*((L +matrix(rnorm(n*ns),ncol=ns))>0)
XData = data.frame(x1=x[,2])
nChains = 2
test.run = T
if (test.run){ thin = 1 samples = 10 transient = 5 verbose = 0 } else { thin = 10 samples = 1000 transient = 1000 verbose = 0 }
rL.nngp = HmscRandomLevel(sData = xycoords, sMethod = 'NNGP', nNeighbours = 20)
rL.nngp = setPriors(rL.nngp,nfMin=2,nfMax=15)
studyDesign = data.frame(sample = as.factor(1:n))
m.nngp = Hmsc(Y=yprob, XData=XData, XFormula=~x1, studyDesign=studyDesign, ranLevels=list("sample"=rL.nngp),distr="probit")
m.nngp = sampleMcmc(m.nngp, thin = thin, samples = samples, transient = transient, nChains = nChains, verbose = verbose, updater=list(GammaEta=FALSE))
GGradient = constructGradient(m.nngp, focalVariable = "x1", ngrid=20)
predG=predict(m.nngp, Gradient=GGradient)
Error in get.knnx(data, query, k, algorithm) : Number of columns must be same!
Hi Mirko & Jari,
This issue should now be fixed.
Cheers, Melinda
@MelindadeJonge so it was dropping dimensions.
@jarioksa Yes that was the issue. It only happened when making predictions for the latent variables on only one new sampling unit.
Hi Melinda & Jari,
I confirm that the last commit by Melinda fixed the issue when predicting on a gradient. Notwithstanding, another issue (maybe linked with the previous one) still stands when predicting on a different covariates dataset. Specifically, if I create new variables and studydesign like:
XData_pred=rbind(XData, XData)
studyDesign_pred=data.frame(sample=as.factor(1:nrow(XData_pred)))
and try to predict the m.nngp
model on them:
predict(m.nngp, post=poolMcmcChains(m.nngp$postList)[1], XData=XData_pred, expected=F, studyDesign=studyDesign_pred)
I get the following error:
Error in rL$s[unitsAll, ] : subscript out of bounds.
In addition, if I try to run the same procedure on my own data, I get another error:
predY = predict(m.spatial, post=poolMcmcChains(m.spatial$postList)[i], XData=vars_RTP, expected=F, studyDesign=studyDesignNew)[[1]]
Error in get.knnx(data, query, k, algorithm) : Data non-numeric
Are there any errors in my coding maybe? Many thanks for your assistance. Cheers Mirko
@mrkdfb this may be related to unsolved issue #31: we have a problem with factor covariates which are not handled adequately in the code.
@mrkdfb @jarioksa From what I understand you are not fitting a model with factor covariates right? In that case it's probably not related to issue #31 .
The first issue is related closed issue #19 . When you are making predictions to new spatial units, those units should have been specified when first defining the random levels. This is the case for all spatial models.
The second error I have not seen yet, could you make a reproducible example for me so I can see what's going on?
Hi Melinda,
you are correct. The XData object in the code above is generated as described in the vignette 4 on spatial models. Accordingly, it contains only the numerical predictor "x1". This means that the first issue is related to the #19. In this regards, the #19 post does not provide any code describing how to specify all the spatial units before training the model, as well as how to indicate which unit to use in calibration and which one to take apart for prediction. Could you gently provide a couple of lines of code clarifying that? I suspect the second issue too derives from the same problem. Many thanks. Mirko
Hi Mirko,
The spatial locations of the units that you want to predict to need to be included in the sData that is given to HmscRandomLevel
. For your simulated data example we can generate some spatial data for n new locations in the same way as we did for the original locations and add this to the original coordinates. Then you need to make the random levels using the full coordinate set. To specify which units should be used in the fitting of the model you use the studyDesign. So if you leave the studyDesign to what it was, the model will only be fitted on the first n units.
xycoords_new = matrix(runif(2*n),ncol=2)
xycoordsFull = rbind(xycoords,matrix(runif(2*n),ncol=2))
colnames(xycoordsFull) = c("x-coordinate","y-coordinate")
rownames(xycoordsFull) = 1:nrow(xycoordsFull)
rL.nngp = HmscRandomLevel(sData = xycoordsFull, sMethod = 'NNGP', nNeighbours = 20)
studyDesign = data.frame(sample = as.factor(1:n))
m.nngp = Hmsc(Y=yprob, XData=XData, XFormula=~x1, studyDesign=studyDesign, ranLevels=list("sample"=rL.nngp),distr="probit")
m.nngp = sampleMcmc(m.nngp, thin = 1, samples = 10, transient = 5, nChains = 2, verbose = 0, updater=list(GammaEta=FALSE))
XData_pred=rbind(XData, XData)
studyDesign_pred=data.frame(sample=as.factor(1:nrow(XData_pred)))
predict(m.nngp, post=poolMcmcChains(m.nngp$postList)[1], XData=XData_pred, expected=F, studyDesign=studyDesign_pred)
I hope this helps.
Cheers, Melinda
Hi Melinda,
many thanks! The code you provided works perfectly in fixing the first issue regarding the studydesign in spatial models. Unfortunately, it does not fix the second issue that I get when working with my own data. Can I send you a script and a .RData workspace to allow you to reproduce the error?
Hi Mirko,
I actually ran into the same error message today while trying to fix another issue with the nngp predictions. I think the last commit should fix your issue. But if not, let me know, in that case you can send me the script and workspace.
Cheers, Melinda
Hi Melinda,
I confirm that your last commit fixed also the remaining issue. Now, everything works really properly. Thanks again. Cheers, Mirko
Hi Melinda,
sorry to bother you again, but the original issue that appeared when predicting on a gradient comes out again after the last commit. If you try to run again the reproducible example I attached in one of the posts above ("Here it follows..."), you'll see that. Any ideas? Thanks. Cheers M
Hi Mirko,
Apologies, I messed up something in the last commit, it should now be fixed again.
Regarding the other issue you mentioned in one of the posts above:
Error in get.knnx(data, query, k, algorithm) : Data non-numeric
Did you make sure that your xydata is supplied as a matrix with column and row names? I encountered the same error when my xydata was a data frame instead.
Cheers, Melinda
Hi Melinda,
you are right, that was exactly the problem. After a lot of attempts, I got that the variables for prediction need to be a data.frame, while xydata must be a matrix with column and row names provided. That was the only setting that made both predictions on gradient and on external variables working. Thanks again for you precious assistance. Cheers Mirko
I think we are not too user-friendly: some data must be data.frames or we fail, another must be a matrix or we fail, then perhaps it must be specifically constructed matrix (or was it data frame?) with certain names for rows and columns or we fail. In all cases the error messages are obscure and not at all related to the actual error. I think most of these things should be checked within the code so that users don't need to tackle with these quirks. In the future versions...
Hi,
I report the following error tryng to run a conditional cross-validation on a NNGP model:
here is the traceback():
Diving into the functions, I discovered the issue appears when launching the updateEta function
Any ideas? Thanks in advance for your precious help. Mirko