juanmartinsantos / et0stacking

On the suitability of stacking-based ensembles in smart agriculture for evapotranspiration prediction
2 stars 0 forks source link

Request for full code #1

Open harshini-web opened 3 years ago

harshini-web commented 3 years ago

Sir, your stack ensembling for evapotranspiration is awesome paper. I choose your paper as my base paper for my sem project. But I downloaded the source code it contains only stacking function. Can u please provide me full code and how to include 20 places of spain datasets in the code please... Looking forward for your reply sir........... Email Id :boggarapuharshini1705@gmail.com

juanmartinsantos commented 3 years ago

Hi,

Thank you for your comment, I'm glap that you like my work.

The stacking ensemble is designed to receive a train and test sets. Also, you can set the number of models that you want to build in the second-level algorithm. Therefore, you must create a function that divides the dataset in train-test, these sets go into the stacking ensemble. In my case, I did a cross-validation of 5-folds. On the other hand, you can find the rest of comparison methods of my paper in several existing R packages. The datasets you can found in this github page.

library(gbm) library(e1071) library(xgboost) library(randomForest)

Any other question let to me.

Best regards, Juan Martín.

harshini-web commented 3 years ago

Hi,

Thank you for your comment, I'm glap that you like my work.

The stacking ensemble is designed to receive a train and test sets. Also, you can set the number of models that you want to build in the second-level algorithm. Therefore, you must create a function that divides the dataset in train-test, these sets go into the stacking ensemble. In my case, I did a cross-validation of 5-folds. On the other hand, you can find the rest of comparison methods of my paper in several existing R packages. The datasets you can found in this github page.

library(gbm) library(e1071) library(xgboost) library(randomForest)

Any other question let to me.

Best regards, Juan Martín.

harshini-web commented 3 years ago

I am so glad to see your reply sir

harshini-web commented 3 years ago

By following your guidance sir, I executed your code .But I got stuck at a line of your code, I tried all the possible ways to remove the error of "_Error in -ncol(test) : invalid argument to unary operator_" in stacking. Could u please help me sir as I am new to r and machine learning. Please provide your valuable guidance.

----- Stacking -----

stacking_et0 <- function(train, test, nmeta_mdls=1){ nmdls <- 3 test <- test[,-ncol(test)] #------>getting error in this line out_train <- train[,ncol(train)] train <- train[,-ncol(train)]

Full code

===================================================================================================================

------ On the suitability of stacking-based ensembles in smart agriculture for evapotranspiration prediction ------

===================================================================================================================

----- libraries -----

library(gbm) library(e1071) library(xgboost) library(randomForest) library(caTools) dataset<-read.csv("aln.csv")

split_data <- function(dataset, train = TRUE){ length<- nrow(dataset) total_row <- length *0.8 split <- 1:total_row if (train ==TRUE){ train_df <- dataset[split, ] return(train_df)
} else { test_df <- dataset[-split, ] return(test_df)
} } train <- split_data(dataset, train = TRUE) test <- split_data(dataset, train = FALSE)

ans <- stacking_et0(train, test) View(ans)

----- Stacking -----

stacking_et0 <- function(train, test, nmeta_mdls=1){ nmdls <- 3 test <- test[,-ncol(test)] out_train <- train[,ncol(train)] train <- train[,-ncol(train)]

==========================================================

----- First-level Algorithms (RF, SVM, GBM, XGBoost) -----

==========================================================

pd_train <- matrix(0,nrow(train),12) pd_test <- matrix(0,nrow(test),12)

train_with_out <- cbind(train, out_train) frm <- as.formula(paste(names(train_with_out)[ncol(train_with_out)],"~.",sep = ""))

for (i in 1:nmdls) {

=== RandomForest ===

trees <- sample(350:600,1)
node <- sample(5:15,1)

model_rdm <- randomForest(frm, data=train_with_out, ntree=trees, nodesize = node)
pd_train[,i] <- predict(model_rdm, train)
pd_test[,i] <- predict(model_rdm, test)

#======= SVM ========#
tole <- round(runif(1, 0.0001,0.01), 4)
reg <- sample(2:14,1)

model_svm <- e1071::svm(x = train, y = out_train, tolerance=tole, cost= reg, scale = TRUE, type = "eps-regression", kernel = "radial")
pd_train[,i+3] <- predict(model_svm, train)
pd_test[,i+3] <- predict(model_svm, test)

#======= GBM ========#
trees <- sample(550:650,1)

model_gbm <- gbm::gbm(formula = frm,  data = train_with_out, n.trees = trees, interaction.depth = 3, bag.fraction = 1, distribution = "gaussian")
pd_train[,i+6] <- predict.gbm(model_gbm, train, n.trees = trees)
pd_test[,i+6] <- predict.gbm(model_gbm, test, n.trees = trees)

#===== XGBoost ======#
num_par <- sample(50:200,1)
iter_bos <- sample(40:50,1)
depth <- sample(3:5,1)
weight <- sample(seq(5,7,0.1),1)

model_xgb <- xgboost(data = as.matrix(train), label = as.matrix(out_train), num_parallel_tree=num_par, nrounds = iter_bos, max_depth= depth, min_child_weight=weight, subsample=0.9, eta= 0.1, verbose = 0)
pd_train[,i+9] <- predict(model_xgb, as.matrix(train))
pd_test[,i+9] <- predict(model_xgb, as.matrix(test))

}

new_train <- cbind(train,pd_train) new_test <- cbind(test,pd_test)

==========================================================

------------ Second-level Algorithm (XGBoost) ------------

==========================================================

pred_meta <- matrix(0,nrow = nrow(new_test), ncol = nmeta_mdls) for(nm in 1:nmeta_mdls){ num_par <- sample(50:200,1) iter_bos <- sample(40:50,1) depth <- sample(3:5,1) weight <- sample(seq(5,7,0.1),1)

meta_model <- xgboost(data = as.matrix(train), label = as.matrix(out_train), num_parallel_tree=num_par, nrounds = iter_bos, max_depth= depth, min_child_weight=weight, subsample=0.9, eta= 0.1, verbose = 0)
pred_meta[,nm] <- predict(meta_model, as.matrix(new_test))

}

end_pred <- rowMeans(pred_meta) return(end_pred)

}

Please let me know error in the code sir.. Looking forward for your precious reply...

juanmartinsantos commented 3 years ago

Hi,

I checked your code. First, I'm very sorry, the stacking code had a bug with the training set used in the second-level algorithm. I have already fixed this error in your code and Github code.

I sent you a personal email, with your code attachment and some suggested tips, also added a sample dataset

Best regards, Juan Martín

harshini-web commented 3 years ago

Thank you so much sir. I succeeded in getting predicted values as a whole as output from stacking function with your guidance. But I got stuck of how to proceed further .Can u please explain me sir how did you got the stacking values with respect to temperature , mass transfer, radiation and meteorological variables which are mentioned in the performance analysis sir please..... Looking forward for your valuable reply sir Thank you sir...

juanmartinsantos commented 3 years ago

Hi, I'm glad that you made it.

Obtaining the predictions will depend on the objective of the project.

In my case, the predictions were used to evaluate the proposed model. My original datasets are composed of 12 features, some of them related to each meteorological parameter (temp, rad, mass and all together). I got better results considering all the features together (best explained at paper). You can do something similar, try changing the set parameters or selecting different features. Finally, the predictions obtained from test set are evaluated with an error metric (RMSE, MAE, MSE ...) anything of your preference.

On the other hand, if you want predictions on an unknown dataset: Once you find the best fit for your data, you need to modify the stacking ensemble for the model to be returned. So you can predict on data not seen before.

best regards Juan Martín.