kogalur / randomForestSRC

DOCUMENTATION:
https://www.randomforestsrc.org/
GNU General Public License v3.0
114 stars 18 forks source link

Custom splitting #92

Closed naskhi closed 3 years ago

naskhi commented 3 years ago

My problem is probably rather simple, but it is the first time I try to manipulate an existing R package so please excuse me if the answer is obvious. In order to test various new splitting rules, I downloaded the repository to be able to modify the corresponding .c file. Before executing any changes, trying to load the functions manually into R lead to the Error message:

<simpleError in .Call("rfsrcGrow", as.integer(do.trace), as.integer(seed), as.integer(ensemble.bits + impute.only.bits + var.used.bits + split.depth.bits + importance.bits + bootstrap.bits + forest.bits + proximity.bits + perf.bits + rfq.bits + gk.quantile.bits + statistics.bits + empirical.risk.bits), as.integer(samptype.bits + forest.wt.bits + distance.bits + na.action.bits + split.cust.bits + membership.bits + terminal.qualts.bits + terminal.quants.bits + tdc.rule.bits), as.integer(splitinfo$index), as.integer(splitinfo$nsplit), as.integer(mtry), lot, base.learner, as.integer(vtry), as.integer(holdout.array), holdout.specs, as.integer(formulaDetail$ytry), as.integer(nodesize), as.integer(nodedepth), as.integer(length(cause.wt)), as.double(cause.wt), as.integer(ntree), as.integer(n), list(as.integer(length(yvar.types)), if (is.null(yvar.types)) NULL else as.character(yvar.types), if (is.null(yvar.types)) NULL else as.integer(yvar.nlevels), if (is.null(subj)) NULL else as.integer(subj), if (is.null(yvar)) NULL else as.double(as.vector(yvar))), list(as.integer(n.xvar), as.character(xvar.types), as.integer(xvar.nlevels), if (is.null(xvar.time)) NULL else as.integer(xvar.time), if (is.null(subj.time)) NULL else as.integer(subj.time)), list(as.integer(length(case.wt)), if (is.null(case.wt)) NULL else as.double(case.wt), as.integer(sampsize), if (is.null(samp)) NULL else as.integer(samp)), as.double(split.wt), as.double(yvar.wt), as.double(xvar.wt), as.double(xvar), as.integer(length(event.info$time.interest)), as.double(event.info$time.interest), as.integer(nimpute), as.integer(block.size), as.integer(length(prob)), as.double(prob), as.double(prob.epsilon), as.double(NULL), as.integer(get.rf.cores())): C Symbolname "rfsrcGrow" nicht in Ladeliste> Fehler in rfsrc(Surv(time, status) ~ ., data = veteran, ntree = 100, block.size = 1, : An error has occurred in the grow algorithm. Please turn trace on for further analysis.

The corresponding R Code is:

filenames <- list.files("~/randomForestSRC-master/R", pattern="*.R", full.names=TRUE) lapply(filenames, source)

data(veteran, package = "randomForestSRC")

v.obj <- rfsrc(Surv(time, status) ~ ., data = veteran, ntree = 100, block.size = 1, splitrule = "logrank")

trace of the function did not lead to further information, that is why I am contacting you directly now. Is there a smarter way to load the package manually into R so that I can customize the .c file later?

Thank you very much for your help.

kogalur commented 3 years ago

We're not sure what you are doing. The best way to get the source code for the CRAN package is from CRAN:

https://cran.r-project.org/web/packages/randomForestSRC/index.html

Edit the relevant files, compile and re-install the package. It's best not to touch the GitHub repository unless you have the more complex toolchain that it requires.

naskhi commented 3 years ago

Thank you very much for your fast response. I will rephrase this: my goal is to include customized splitting rules and I found a file splitCustom.c in your git repository which I could not find in the folder created by install.packages(). So maybe I am making this more complicated than needed. Is there an easy way to use your rfsrc function with custom splitting rules?

kogalur commented 3 years ago

Do not use the GitHub repository for your task. Use the source package posted on CRAN. Edit the splitCustom.c file per the directions in the help file. Examples are provided in that file for all families. You need to know how to compile and install the package manually.

ishwaran commented 3 years ago

Here are further details about custom splitting (taken straight from the rfsrc help file)

Custom splitting: All families except unsupervised are available for user defined custom splitting. Some basic C-programming skills are required. The harness for defining these rules is in ‘splitCustom.c’. In this file we give examples of how to code rules for regression, classification, survival, and competing risk. Each family can support up to sixteen custom split rules. Specifying ‘splitrule="custom"’ or ‘splitrule="custom1"’ will trigger the first split rule for the family defined by the training data set. Multivariate families will need a custom split rule for both regression and classification. In the examples, we demonstrate how the user is presented with the node specific membership. The task is then to define a split statistic based on that membership. Take note of the instructions in ‘splitCustom.c’ on how to register the custom split rules. It is suggested that the existing custom split rules be kept in place for reference and that the user proceed to develop ‘splitrule="custom2"’ and so on. The package must be recompiled and installed for the custom split rules to become available.

naskhi commented 3 years ago

Thank you very much for your answers. I will have a closer look at how to install and compile the package manually.