ropensci / aorsf

Accelerated Oblique Random Survival Forests
https://docs.ropensci.org/aorsf
Other
55 stars 10 forks source link

utility functions for impurity of splits (regression trees) #29

Closed bcjaeger closed 12 months ago

bcjaeger commented 1 year ago

Reduction in variance is a standard technique for assessing regression tree split purity. It would be great to implement a function in utility.cpp with the following inputs:

  1. y_node (type: arma::vec) the outcome values in the current tree node
  2. w_node (type: arma::vec) a vector of non-zero weights (integer valued) the same length as y_node
  3. g_node (type: arma::uvec) a vector of 0s and 1s the same length as y_node, with 0 indicating going to the left child node and 1 indicating the right.

The excerpt below from Ishwaran et al 2014 summarizes the reduction in variance computation very well. We will need to code this, incorporating weights through w_node. Should be able to check that the function gives the exact right answer using matrixStats::weightedVar. @ciaran-evans, would you like to look into this? You could actually write the function as a stand-alone function in orsf_oop.cpp with the usual //[Rcpp::export] tag rather than put it into utility.cpp, and I could move it over when it's ready. Basically we would just want the function to be named compute_var_reduction and we would want to create the file tests/testthat/test-compute_var_reduction.R that tests to make sure our variance reduction function gives the same answer as a function written in R.

image

ciaran-evans commented 1 year ago

This sounds great, will do!