Closed rahulkgour closed 1 year ago
Dear @rvalavi,
I have attached the snapshot of my data which is being used for the given code.
Please let me know if anything further required. Thanks, Rahul
Hi @rahulkgour, just one question. Do you try to run multiple repeats of k-fold cross-validation? e.g. 10 times of 5-fold CV, resulting in fitting your model 50 times.
Hi @rvalavi, Yes, we are trying to run multiple repeats of k-fold CV.
@rahulkgour I'm not sure what you have done would work. I think you need something more like the following. This gives you a dataframe with the number of folds and repeats.
Just remember if you choose the iteration
too high and there are not many blocks in your landscape, the final folds could be very similar (not much variability). 100 should be fine, but I'm not aware of the number of blocks that are created for you.
# Load required packages
library(blockCV)
library(randomForest)
library(ggplot2)
# Reading the SOC (response variable) and covariates
soc_data <- read.csv(file.choose(), header = TRUE)
data <- sf::st_as_sf(soc_data, coords = c("Longitude", "Latitude"), crs = 4326)
# Reading the raster background map for blockCV
elevation <- raster("elevation.tif")
n_rep <- 20
n_fold <- 10
# Create spatial blocks using blockCV package for spatial cross-validation
model_eval <- data.frame(rep = rep(NA, n_rep*n_fold), fold = NA MSE = NA, R2 = NA)
n <- 0
for (j in 1:n_rep) { # j determines number of repetitions
set.seed(j)
sb <- cv_spatial(x = data, # data
column = "SOC", # response variable (variable of interest)
r = elevation, # raster background map
size = 1000, # size of the blocks in meters
k = n_fold, # number of folds
selection = "random", # find evenly dispersed folds
iteration = 100,
# biomod2 = TRUE,
raster_colors = terrain.colors(10, rev = TRUE))
for(i in 1:n_fold){
n <- n + 1
test_set <- which(sb$folds_ids == i)
train_set <- which(sb$folds_ids != i)
test_data <- soc_data[test_set, ]
train_data <- soc_data[train_set, ]
# Fitting the Random Forest model on the training data
RF_model <- randomForest(SOC ~ ., data = train_data, ntree = 500)
# Making predictions on the testing data
RF_predictions <- predict(RF_model, newdata = test_data)
# Calculating the performance metrics
RF_MSE <- mean((test_data$SOC - RF_predictions)^2)
RF_R2 <- cor(test_data$SOC, RF_predictions)^2
# Save the results for this fold
model_eval$rep[n] <- j
model_eval$fold[n] <- i
model_eval$MSE[n] <- RF_MSE
model_eval$R2[n] <- RF_R2
}
}
Dear @rvalavi,
Thank you for your kind response. I tried your code snippet as well, still I can see the performance little skewed. However, when I am trying with random 10 Fold Cross Validation through usual means (not implying blockCV) while training the model, I am getting the expected validation parameters.
Regards, Rahul
What do mean by "skewed" @rahulkgour ?
BTW, now I see your response is a continuous variable, you should not use column
for your cv_spatial
. That is only for categorical and binary data. Just remove that argument (in your case column = "SOC"
must be removed).
I'll add a warning in the code that warns you about using the column
argument when you have continuous data.
Dear @rvalavi,
Sorry for the late reply.
I tweaked back my code again, and seems working as expected. I am really thankful for your assistance.
Yes, adding a warning in the code about the usage of column for continuous data would be really helpful for the users.
I think we can close this ticket.
Thanks again, Rahul
Perfect! I'm glad that was helpful @rahulkgour.
Dear @rvalavi,
I have successfully executed the steps mentioned below without encountering any errors. However, I am facing issues with the modeling phase as the expected results are not being obtained.
I am unable to debug the issue especially while using the spatially blocked cross-validations which were created as mentioned in the below code.
May be the way I am creating the spatial_blocks or feeding them during training and testing the model is not correct.
Therefore, I would greatly appreciate your assistance in resolving it. I am intending to use blockCV for my publication.
Thanks in advance!
Kind regards, Rahul