grf-labs / grf

Generalized Random Forests
https://grf-labs.github.io/grf/
GNU General Public License v3.0
938 stars 250 forks source link

Is the order of covariates suppose to affect the treatment estimate? #1363

Closed paradigmfellow closed 9 months ago

paradigmfellow commented 9 months ago

Description of the bug I noticed that when I was working with a file, I had changed the ordering of the X covariate data frame, and I started to get different estimates. Granted, it was not drastic. I kept the same seed.

For example, if you have 5 features, and you order the columns differently:

1 2 3 4 5

5 4 3 2 1

1 5 2 4 3

Will all of these always produce the same estimate?

Steps to reproduce

This is where I manually changed the ordering

namers<-names(X) #getting the names of the same file in a different order X1<-X1 %>% dplyr::select(all_of(namers)) #reordered variables in covariate file.

estimate outcome of forest

Y.forest <- regression_forest(X = X1, Y = Y1, clusters = clus, equalize.cluster.weights = FALSE, seed = 1111)

orthogonalized Y

Y.hat <- predict(Y.forest)$predictions

estimate propensity forest

W.forest <- regression_forest(X = X1, Y = W1, clusters = clus, equalize.cluster.weights = FALSE, seed = 1111)

orthogonalized treatment

W.hat <- predict(W.forest)$predictions

estimate initial causal forest

cf.raw = causal_forest(X = X1, Y = Y1, W = W1, Y.hat = Y.hat, W.hat = W.hat, clusters = clus, equalize.cluster.weights = FALSE, seed = 1111)

find predictors that had greater than average importance

varimp <- variable_importance(cf.raw) selected.idx <- which(varimp > mean(varimp)) selected.idx2 <- which(colnames(X1)=='time_var') selected.idx3<-c(selected.idx,selected.idx2)

print(selected.idx3)

varimp <- data.frame(variable_importance(cf.raw)) varimp$names <- names(X1)
X2<-X1 X3<-X2 %>% dplyr::select(all_of(selected.idx3))

estimate final causal forest with most important predictors and tune parameters

cf <- causal_forest(X = X3, Y = Y1, W = W1, Y.hat = Y.hat, W.hat = W.hat, clusters = clus,

sample.weights = weight,

                  equalize.cluster.weights = FALSE,
                  tune.parameters = "all",
                  seed = 1111)

tau.hat <- predict(cf)$predictions

GRF version 2.3.0

erikcs commented 9 months ago

Hi @paradigmfellow, yes, that is to be expected.

paradigmfellow commented 9 months ago

Thank you!