susanathey / causalTree

Working repository for Causal Tree and extensions
GNU General Public License v3.0
427 stars 149 forks source link

Xpred.rpart Returns Levels Not Treatment Effects #20

Open jonathandroth opened 7 years ago

jonathandroth commented 7 years ago

Hi there,

The rpart function xpred.rpart is supposed to return predicted values from a tree under cross-validation. (I am trying to implement this along with causal tree since I'd like to choose my complexity parameter using a customized cross-validation criterion.)

However, when I use it with the output of a causalTree, it seems to predict the level of the y variable, rather than the treatment effect. An example is below:


#Add y to all the y-values so that levels and treatment effects are very different
simulation.1$y <- simulation.1$y + 100 

tree <- causalTree(y~ x1 + x2 + x3 + x4, data = simulation.1, treatment = simulation.1$treatment,
                   split.Rule = "CT", cv.option = "CT", split.Honest = T, cv.Honest = T, split.Bucket = F, xval = 5, 
                   cp = 0, minsize = 20, propensity = 0.5)

opcp <- tree$cptable[,1][which.min(tree$cptable[,4])]
opfit <- prune(tree, opcp)

#Predicting using the tree gives treatment effects
mean( predict(opfit) )
[1] 0.9670799

#Using xpred.rpart gives levels
mean( xpred.rpart(tree,cp=opcp) )
[1] 100.3836

Any help on this (or another way of implementing custom cross-validation criteria) would be appreciated! Thanks!

susanathey commented 6 years ago

@jonathandroth apologies for the slow response. Did you find a solution, and are you still interested in this?

jonathandroth commented 6 years ago

@susanathey I worked around this by manually doing cross-validation, i.e. constructing folds myself, training the tree in K-1 of the folds, and then predicting and calculating the loss in the Kth fold.

I think it would be nice if this could be automated better, but I don't need an immediate fix for my current purposes.