Open cgeron35 opened 10 months ago
Hi,
I wonder if the problem is with the naming of the units. Does it hold that rownames(XY)==levels(studyDesign$spatial)? Also, if your coordinates are lat and lon, you should choose the option longlat=TRUE in the function HmscRandomLevel(…).
Best,
O2
From: cgeron35 @.> Sent: maanantai 25. syyskuuta 2023 17:35 To: hmsc-r/HMSC @.> Cc: Subscribed @.***> Subject: [hmsc-r/HMSC] Error on xy coordinates spatial random factor (Issue #167)
Hello,
I am trying to check if Hmsc models can be fitted to my data (I am trying to model plant species distribution), following the step by step code on the book "Joint species distribution modelling" (page 321) and when running this code : sampleMcmc(m.FULL,samples=2)
I am stuck with the following error : "Error in hM$rL[[r]]$s[levels(hM$dfPi[, r]), ] : subscript out of bounds"
I checked on github, and found that this could be due to my spatial random factor, which corresponds to a matrix of the x and y coordinates (numerical, in WGS 84) that I specified to my model : XY=as.matrix(XData[,c(2:3)]) #correspond to longitude and latitude columns rL.spatial = HmscRandomLevel(sData = XY)
indeed my rL.spatial$distMat is null...
I am confused whether this error comes from the projection system of the coordinates or if it has nothing to deal with that ? I don't get the coordinate system in which the points are in the book (page 321)
Does anyone have an idea ?
Thank you !
— Reply to this email directly, view it on GitHubhttps://github.com/hmsc-r/HMSC/issues/167, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEIYMZW7OMDH6H5W3NBR6I3X4GJAHANCNFSM6AAAAAA5GGMS6Q. You are receiving this because you are subscribed to this thread.Message ID: @.**@.>>
Hello O2,
thanks for your quick answer !
I managed to fix the issue by indeed changing rownames of XY by the rownames(studyDesign$spatial). I left the coordinates as numerical wgs 84. I also added the option longlat=TRUE, and this worked perfectly !
Compared to the models where the rL.spatial did not have longlat=TRUE, the alphapw changed due to how locations are taking into account I guess ?
I wonder if this is normal that this code returns "Null" as a result ?
models[["plants.Full"]][["ranLevels"]][["sample"]][["distMat"]] NULL
Thanks in advance,
Best !
Hi, I wonder if the problem is with the naming of the units. Does it hold that rownames(XY)==levels(studyDesign$spatial)? Also, if your coordinates are lat and lon, you should choose the option longlat=TRUE in the function HmscRandomLevel(…). Best, O2
Hi,
Compared to the models where the rL.spatial did not have longlat=TRUE, the alphapw changed due to how locations are taking into account I guess ?
alphapw describes the discrete prior for alpha, where the default prior accounts for the extent of the study area and hence it depends how distance are calculated.
wonder if this is normal that this code returns "Null" as a result ?
I think (but did not check) that possibly that matrix is initialized only in the beginning of MCMC, not when the model is defined; which may explain this. Jari/Gleb could possibly comment on this.
O2
I wonder if this is normal that this code returns "Null" as a result ?
Because this distMat
element of HmscRandomLevel
is not supposed to hold the distance matrix computed from sData
if the spatial context is given as coordinates. It is the stem of the user interface option that enables to specify the distance matrix directly in HmscRandomLevel(..., distMat=smth)
constructor.
The effective distance matrix is indeed initialised only in the beginning of MCMC (based on HmscRandomLevel[["sData"]]
or HmscRandomLevel[["distMat"]]
and Hmsc[["studyDesign"]]
) and it is not explicitly returned to the user.
Hi O2 and gitkhonov,
thanks for your really quick and clear answers ! This is much clearer now :)
I have two additional questions :
I am running the code on a small portion of my data to test it out. I noticed that in order for the HmscRandomLevel to run, I need to have different coordinated for each of my locations in my XY matrix... (otherwise I get an error). However, in my full datasets, I have some relevés taking place at the same location (and which I will indicate in the study.design, as a factor). I wonder if this is then possible to have the same coordinates for different data points (locations, with the same sample.id)?
I am comparing the fit of three different models (built following the step by step code in the book "Joint species distribution modelling") based on the code on page 323: "partition = createPartition(models[[1]], nfolds = 2, column = "route") MF = list() MFCV = list() for (i in 1:3){ preds = computePredictedValues(models[[i]]) MF[[i]] = evaluateModelFit(hM = models[[i]], predY = preds) preds = computePredictedValues(models[[i]], partition = partition) MFCV[[i]] = evaluateModelFit(hM = models[[i]], predY = preds) }" Because of the fact that I have a set of 13 species in this subset of my dataframe, I do get the RMSE, AUC and TjurR2 for each of my 13 species. Would you have a way to get a "mean" value for each of my three models?
Thanks in advance !
Best
Hi again,
You can have repeated samples from the same location. In studyDesign, the same locations (more generally units) can repeat. In sData=… or units=…, each location/unit should be present only once.
Because of the fact that I have a set of 13 species in this subset of my dataframe, I do get the RMSE, AUC and TjurR2 for each of my 13 species. Would you have a way to get a "mean" value for each of my three models?
I am not sure I get the question correctly, but usually we average e.g. the AUC values over the species to get one overall measure of model fit.
Best,
O2
Hi O2,
thanks for your answer.
To add on your answers :
when I tried having the XY matrix to feed to the rL.spatial = HmscRandomLevel(sData = XY), this refused to run (and got an error) if I had similar coordinates there, I wonder if you would have a clue ? I will of course inform with the StudyDesign that some points have the same ID,
is there a dummy code just to get the mean eg. AUC value across all species of a specific model?
Thanks in advance.
Best
When I tried having the XY matrix to feed to the rL.spatial = HmscRandomLevel(sData = XY), this refused to run (and got an error) if I had similar coordinates there, I wonder if you would have a clue ? I will of course inform with the StudyDesign that some points have the same ID,
In XY there should not be identical (or very similar) coordinates, each location should be there only once (see previous answer).
is there a dummy code just to get the mean eg. AUC value across all species of a specific model?
mean(MF$AUC)
Best,
O2
Hi again,
oh sorry, I did not get your previous answer. So I should only give the locations for each of the studydesign sample ID if I get you correctly ?
Thanks for the code, sorry, it was actually quite simple.
Best
Assume that you call location in studyDesign as location, that you have 100 locations, and 200 samples (so on average 2 per location). Lets call the location ID:s as loc001,…, loc100. Then studyDesign$location is a factor of length 200 with 100 levels, where the levels as are loc001,..loc100, and values repeat (as length is 200). The dimensions of XY are 100 x 2 (assuming 2-dimensional space), where rownames are loc001, …, loc100, so here the locations do not repeat.
Best,
O2
Thanks O2, that is indeed much clearer !
Hello,
I am trying to check if Hmsc models can be fitted to my data (I am trying to model plant species distribution), following the step by step code on the book "Joint species distribution modelling" (page 321) and when running this code : sampleMcmc(m.FULL,samples=2)
I am stuck with the following error : "Error in hM$rL[[r]]$s[levels(hM$dfPi[, r]), ] : subscript out of bounds"
I checked on github, and found that this could be due to my spatial random factor, which corresponds to a matrix of the x and y coordinates (numerical, in WGS 84) that I specified to my model : XY=as.matrix(XData[,c(2:3)]) #correspond to longitude and latitude columns rL.spatial = HmscRandomLevel(sData = XY)
indeed my rL.spatial$distMat is null...
I am confused whether this error comes from the projection system of the coordinates or if it has nothing to deal with that ? I don't get the coordinate system in which the points are in the book (page 321)
Does anyone have an idea ?
Thank you !