Open JunhaoWang opened 4 years ago
There is a note on scaling in the online documentation here:
https://grf-labs.github.io/policytree/articles/policytree.html#gauging-the-runtime-of-tree-search
As you see the cardinality of the the Xj's is important, and you can speed things up by trying to increase split.step
(in effect rounding the Xj's).
But n=100k and p=200 will not take an agreeable amount of time. You can try to reduce the dimensionality by only using say the 20 variables with the highest split frequencies across the 20 causal forests.
The argmax policy is discussed in section 5.1 (California Gain example) in https://arxiv.org/pdf/1702.02896.pdf (referred to as the plug-in policy) and may be fine, depending on your purpose (interpretable predictions or not).
For practical reference, here is a short table of empirical run times for policy_tree
(version 1.0).
depth | n (continuous) | features | actions | split.step | time |
---|---|---|---|---|---|
2 | 1000 | 30 | 20 | 1 | 1.5 min |
2 | 1000 | 30 | 20 | 10 | 7 sec |
2 | 10 000 | 30 | 20 | 1 | 3 hrs |
2 | 10 000 | 30 | 20 | 10 | 14 min |
2 | 10 000 | 30 | 20 | 1, but round(X, 2) |
8 min |
2 | 100 000 | 30 | 20 | 10 | 50 hrs |
2 | 100 000 | 30 | 20 | 1, but round(X, 2) |
6.3 hrs |
2 | 100 000 | 60 | 20 | 1, but round(X, 2) |
25 hrs |
2 | 100 000 | 30 | 3 | 10 | 7.4 hrs |
policy_tree() can't scale to my data size (100000 obs, 200 dimensional state/covariate, 20 actions) but multi_causal_forest() can scale, can I just use argmax of multi-action treatment effect estimation as a good policy, instead of searching exhaustively through tree functions from state to produce actions?