miranov25 / RootInteractive

4 stars 12 forks source link

Augmented Random Forest with Kernel Convolution + predictRFStat extension #359

Open miranov25 opened 4 days ago

miranov25 commented 4 days ago

Augmented Random Forest with Kernel Convolution

For fast prototyping, a smooth and flexible representation of functions is essential. Traditional approaches using trees or forests for function representation typically result in a piecewise constant output, which is a significant limitation.

To achieve a smoother representation, we propose data augmentation by randomly smearing the input vector of explanatory variables (X_n) with a user-defined kernel function (default is Gaussian), denoted as (W_n).

Three functionalities should be implemented:

  1. Training Augmentation: Each tree in the forest should be augmented using a random vector (E_n), enhancing the diversity and robustness of the model.

  2. Smoothed Mean: Calculate a weighted mean of the tree outputs in the local neighborhood to produce a smoother result.

  3. Statistical Analysis of Predictions: Provide functionality to calculate various statistics from different tree predictions, including weighted mean, standard deviation, median, and linear fits (possibly enhanced with kernel methods).

    • Note: It is unclear whether the scikit-learn trees can provide information about the "box" defining cube properties. I would need additional investigation to figure out if this aspect is feasible.