spatialstatisticsupna / bigDM

R package for scalable Bayesian disease mapping models for high-dimensional data
13 stars 4 forks source link

CAR_INLA and categorical levels covariates #4

Closed seyiaros closed 7 months ago

seyiaros commented 7 months ago

Hi

I have been trying to specify a model formula that will allow me to include different levels of covariates using CAR_INLA. Seems this is not supported by CAR_INLA. I am trying to see how the estimates of covariates change from a reference category. When I try using a factor variable within the CAR_INLA function, I get an error message " x must be numeric". Is there any way to solve this?

seyiaros commented 7 months ago

Hi

I have been trying to resolve the issue raised. So far I tried to reproduce the actions I have taking below:

1) To ensure the row names match with the IDs of the spatial units defined by the ID.area variable. row.names(df) <- df$ID

2) Then I created a categorical covariates matrix for the object X:

X <- model.matrix( ~ var1 + var2 + var3)

The X matrix of the categorical covariates was then used to defined the model as:

3) model <- CAR_INLA(carto = df, ID.area= "ID", ID.group = "RGID", o = "cases", E = "expected", prior = "BYM2", PCpriors = TRUE, X = X, model = "partition", k = 0, strategy = "simplified.laplace", save.models = TRUE)

However, when I run the above model I get a new error message; Error in CAR_INLA(carto = df, ID.area = "ID", ID.group = "RGID", : '(Intercept)' variable not found in carto object.

I would appreciate comments on how to proceed from here.

aritz-adin commented 7 months ago

Dear @seyiaros,

Two notes about the example of your message:

  1. To include categorical (factor) variables in the model using the CAR_INLA() function, they must be encoded as dummy variables. So using the model.matrix() function is a good option!
  2. Since the CAR_INLA() function includes an intercept by default, you must explicitly remove it when defining the X matrix.
  3. You should also remove one column to set a category of the variable as the "reference level".

For example, if V is a categorial variable, you can set the first level as the reference by: X <- model.matrix( ~ -1 + V) X <- X[,-1] rownames(X) <- carto$ID

Let me know if this solution works well.

Best, Aritz

seyiaros commented 7 months ago

Hi Aritz

Many thanks for reviewing my initial codes and for your suggestions.

The solution you provided works well.

seyiaros commented 7 months ago

Hi Aritz

A follow-up observation while running the models. I get another error message (Error in checkForRemoteErros(val): 10 nodes produced errors: first error:invalid row.names length) when I change the CAR_INLA K option from 0 to 1. I noticed this might be because of the cluster option I selected but I still got the same error message when I used the sequential plan option.

I have tried to resolve this issue but no solution so far.

Thank you

aritz-adin commented 7 months ago

Hi,

Could you please send me a reproducible example to check where this error came from?

Best, Aritz

seyiaros commented 7 months ago

Hi Aritz

Here is the reproducible example X <- model.matrix( ~ -1 + V1 + V2+ V2+ V3 + V4, data = carto) X <- X[,-1] rownames(X) <- carto$ID

model <- CAR_INLA(carto = carto, ID.area= "ID", ID.group = "RGID", o = "cases", E = "expected", prior = "BYM2", PCpriors = TRUE, X = X, model = "partition", k = 1, strategy = "simplified.laplace", save.models = TRUE,
plan="cluster",workers=rep("localhost",4))

Note when I run the model when k = 0, the model runs but when k= >= 1 I get the error message "Error in checkForRemoteErros(val): 10 nodes produced errors: first error:invalid row.names length" at stage 3 of merging the results.

Thank you

aritz-adin commented 7 months ago

Hi,

I need a fully reproducible example in order to check on my computer what is going on.

Please, send me an Rdata with the "carto" object.

Best, Aritz

seyiaros commented 7 months ago

Hi,

I am working within a secured cluster infrastructure that does not allow the transfer of data or results for any other purpose except for publication or scientific presentation because patient information is included in the analysis dataset.

aritz-adin commented 7 months ago

Hi,

I cannot help you with the error you are getting without being able to replicate the call to the CAR_INLA function...

Could you at least provide me a similar "carto" object (anonymized data or any simulated data with the same types of variables) where the same error is happening?

Best, Aritz

seyiaros commented 7 months ago

Hi Aritz,

Thank you for your time. I figured out where the problem originated - The ID variable must be a character variable. The models are now running well.

aritz-adin commented 7 months ago

Yes, as stated in the documentation, ID.area and ID.group arguments must be character vectors.

I proceed to close this issue.

Best, Aritz