policy_tree() can't scale to my data size but multi_causal_forest() can scale, can I just use argmax of multi-action treatment effect estimation as a good policy ?

JunhaoWang commented 4 years ago

policy_tree() can't scale to my data size (100000 obs, 200 dimensional state/covariate, 20 actions) but multi_causal_forest() can scale, can I just use argmax of multi-action treatment effect estimation as a good policy, instead of searching exhaustively through tree functions from state to produce actions?

erikcs commented 4 years ago

There is a note on scaling in the online documentation here:

https://grf-labs.github.io/policytree/articles/policytree.html#gauging-the-runtime-of-tree-search

As you see the cardinality of the the Xj's is important, and you can speed things up by trying to increase split.step (in effect rounding the Xj's).

But n=100k and p=200 will not take an agreeable amount of time. You can try to reduce the dimensionality by only using say the 20 variables with the highest split frequencies across the 20 causal forests.

The argmax policy is discussed in section 5.1 (California Gain example) in https://arxiv.org/pdf/1702.02896.pdf (referred to as the plug-in policy) and may be fine, depending on your purpose (interpretable predictions or not).

erikcs commented 4 years ago

For practical reference, here is a short table of empirical run times for policy_tree (version 1.0).

depth	n (continuous)	features	actions	split.step	time
2	1000	30	20	1	1.5 min
2	1000	30	20	10	7 sec
2	10 000	30	20	1	3 hrs
2	10 000	30	20	10	14 min
2	10 000	30	20	1, but `round(X, 2)`	8 min
2	100 000	30	20	10	50 hrs
2	100 000	30	20	1, but `round(X, 2)`	6.3 hrs
2	100 000	60	20	1, but `round(X, 2)`	25 hrs
2	100 000	30	3	10	7.4 hrs

grf-labs / policytree

policy_tree() can't scale to my data size but multi_causal_forest() can scale, can I just use argmax of multi-action treatment effect estimation as a good policy ? #46