Open arturomoncadatorres opened 4 years ago
Since this was posted, there's a growing literature suggesting that the time-varying nature of some features would necessitate alternative splitting strategies in RSF's.
Having only a single strategy (log-rank) that is subject to some of the same proportionality assumptions of a Cox Regression might defeat the purpose of a model ideally designed for non-linear problems.
Having at least one alternative option like a Poisson regression log-likelihood could offer an intermediate solution before open-ended splitting strategies become available.
See the following examples of varying splitting strategies:
@james-sexton96 The options for the splitting rule is quite large in the literature. I haven't followed closely the last couple of years, so I'm not sure if a consensus emerged by now. Conditional Inference Forests would definitely be interesting (see #341).
Do you have a reference for the Poisson regression log-likelihood you mentioned?
@sebp Sure thing. See references below.
A poisson regression log-likelihood is well suited for real-world data as opposed to data with structured follow up. There was an attempt to branch the R package randomforestSRC's survival functionality (RF-SLAM paper by Wongvibulsin below). However, both this branch and the original package appear to be unsupported.
It would be nice to mirror sklearn's random forest regressor's parameters by including a kwa for criterion, and if I have time, I can draft an implementation of a poisson split criteria!
Crowther et al. 2012 Autsin P. 2017 Wongvibulsin et al. 2019
See also, poisson criteria added to sci-kit learn
It would be fantastic to have
criterion
(i.e., the function to measure the quality of a split) as a parameter ofRandomSurvivalForest
. I know that currently only the log-rank splitting rule is supported. For now, this could be set as the default (and only option). In the future, this could be expanded to cover other options (for example, from the original paperconservation
,log_rank_score_rule
,log_rank_random
) - changing the corresponding splitting code as well. This would also make theRandomSurvivalForest
more similar to itsscikit
counterparts (e.g.,RandomForestRegressor
), making it (even) more compatible with other packages that build onscikit
's standard structure.I think this could be done easily in
forest.py
:If this is something you think it might be interesting, I would be more than happy to help with a proper PR request.