hmsc-r / HMSC

GNU General Public License v3.0
99 stars 36 forks source link

Error on xy coordinates spatial random factor #167

Open cgeron35 opened 10 months ago

cgeron35 commented 10 months ago

Hello,

I am trying to check if Hmsc models can be fitted to my data (I am trying to model plant species distribution), following the step by step code on the book "Joint species distribution modelling" (page 321) and when running this code : sampleMcmc(m.FULL,samples=2)

I am stuck with the following error : "Error in hM$rL[[r]]$s[levels(hM$dfPi[, r]), ] : subscript out of bounds"

I checked on github, and found that this could be due to my spatial random factor, which corresponds to a matrix of the x and y coordinates (numerical, in WGS 84) that I specified to my model : XY=as.matrix(XData[,c(2:3)]) #correspond to longitude and latitude columns rL.spatial = HmscRandomLevel(sData = XY)

indeed my rL.spatial$distMat is null...

I am confused whether this error comes from the projection system of the coordinates or if it has nothing to deal with that ? I don't get the coordinate system in which the points are in the book (page 321)

Does anyone have an idea ?

Thank you !

ovaskain commented 9 months ago

Hi,

I wonder if the problem is with the naming of the units. Does it hold that rownames(XY)==levels(studyDesign$spatial)? Also, if your coordinates are lat and lon, you should choose the option longlat=TRUE in the function HmscRandomLevel(…).

Best,

O2

From: cgeron35 @.> Sent: maanantai 25. syyskuuta 2023 17:35 To: hmsc-r/HMSC @.> Cc: Subscribed @.***> Subject: [hmsc-r/HMSC] Error on xy coordinates spatial random factor (Issue #167)

Hello,

I am trying to check if Hmsc models can be fitted to my data (I am trying to model plant species distribution), following the step by step code on the book "Joint species distribution modelling" (page 321) and when running this code : sampleMcmc(m.FULL,samples=2)

I am stuck with the following error : "Error in hM$rL[[r]]$s[levels(hM$dfPi[, r]), ] : subscript out of bounds"

I checked on github, and found that this could be due to my spatial random factor, which corresponds to a matrix of the x and y coordinates (numerical, in WGS 84) that I specified to my model : XY=as.matrix(XData[,c(2:3)]) #correspond to longitude and latitude columns rL.spatial = HmscRandomLevel(sData = XY)

indeed my rL.spatial$distMat is null...

I am confused whether this error comes from the projection system of the coordinates or if it has nothing to deal with that ? I don't get the coordinate system in which the points are in the book (page 321)

Does anyone have an idea ?

Thank you !

— Reply to this email directly, view it on GitHubhttps://github.com/hmsc-r/HMSC/issues/167, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEIYMZW7OMDH6H5W3NBR6I3X4GJAHANCNFSM6AAAAAA5GGMS6Q. You are receiving this because you are subscribed to this thread.Message ID: @.**@.>>

cgeron35 commented 9 months ago

Hello O2,

thanks for your quick answer !

I managed to fix the issue by indeed changing rownames of XY by the rownames(studyDesign$spatial). I left the coordinates as numerical wgs 84. I also added the option longlat=TRUE, and this worked perfectly !

Compared to the models where the rL.spatial did not have longlat=TRUE, the alphapw changed due to how locations are taking into account I guess ?

I wonder if this is normal that this code returns "Null" as a result ?

models[["plants.Full"]][["ranLevels"]][["sample"]][["distMat"]] NULL

Thanks in advance,

Best !

Hi, I wonder if the problem is with the naming of the units. Does it hold that rownames(XY)==levels(studyDesign$spatial)? Also, if your coordinates are lat and lon, you should choose the option longlat=TRUE in the function HmscRandomLevel(…). Best, O2

ovaskain commented 9 months ago

Hi,

Compared to the models where the rL.spatial did not have longlat=TRUE, the alphapw changed due to how locations are taking into account I guess ?

alphapw describes the discrete prior for alpha, where the default prior accounts for the extent of the study area and hence it depends how distance are calculated.

wonder if this is normal that this code returns "Null" as a result ?

I think (but did not check) that possibly that matrix is initialized only in the beginning of MCMC, not when the model is defined; which may explain this. Jari/Gleb could possibly comment on this.

O2

gtikhonov commented 9 months ago

I wonder if this is normal that this code returns "Null" as a result ?

Because this distMat element of HmscRandomLevel is not supposed to hold the distance matrix computed from sData if the spatial context is given as coordinates. It is the stem of the user interface option that enables to specify the distance matrix directly in HmscRandomLevel(..., distMat=smth) constructor.

The effective distance matrix is indeed initialised only in the beginning of MCMC (based on HmscRandomLevel[["sData"]] or HmscRandomLevel[["distMat"]] and Hmsc[["studyDesign"]]) and it is not explicitly returned to the user.

cgeron35 commented 9 months ago

Hi O2 and gitkhonov,

thanks for your really quick and clear answers ! This is much clearer now :)

I have two additional questions :

Thanks in advance !

Best

ovaskain commented 9 months ago

Hi again,

You can have repeated samples from the same location. In studyDesign, the same locations (more generally units) can repeat. In sData=… or units=…, each location/unit should be present only once.

Because of the fact that I have a set of 13 species in this subset of my dataframe, I do get the RMSE, AUC and TjurR2 for each of my 13 species. Would you have a way to get a "mean" value for each of my three models?

I am not sure I get the question correctly, but usually we average e.g. the AUC values over the species to get one overall measure of model fit.

Best,

O2

cgeron35 commented 9 months ago

Hi O2,

thanks for your answer.

To add on your answers :

Thanks in advance.

Best

ovaskain commented 9 months ago

When I tried having the XY matrix to feed to the rL.spatial = HmscRandomLevel(sData = XY), this refused to run (and got an error) if I had similar coordinates there, I wonder if you would have a clue ? I will of course inform with the StudyDesign that some points have the same ID,

In XY there should not be identical (or very similar) coordinates, each location should be there only once (see previous answer).

is there a dummy code just to get the mean eg. AUC value across all species of a specific model?

mean(MF$AUC)

Best,

O2

cgeron35 commented 9 months ago

Hi again,

Best

ovaskain commented 9 months ago

Assume that you call location in studyDesign as location, that you have 100 locations, and 200 samples (so on average 2 per location). Lets call the location ID:s as loc001,…, loc100. Then studyDesign$location is a factor of length 200 with 100 levels, where the levels as are loc001,..loc100, and values repeat (as length is 200). The dimensions of XY are 100 x 2 (assuming 2-dimensional space), where rownames are loc001, …, loc100, so here the locations do not repeat.

Best,

O2

cgeron35 commented 9 months ago

Thanks O2, that is indeed much clearer !