neurodata / treeple

Scikit-learn compatible decision trees beyond those offered in scikit-learn
https://treeple.ai
Other
62 stars 14 forks source link

How to handle S@98 estimation when building an honest forest #225

Open adam2392 opened 7 months ago

adam2392 commented 7 months ago
  1. Rejection bootstrap sampling: if a bootstrap sample does not have enough control samples (e.g. 50 for S@98) to estimate S@98 properly, then reject this bootstrap sampled indices and repeat
  2. Upweight the sample weights based on class: this is the strategy sklearn currently has
  3. Stratify bootstrap sample:
adam2392 commented 7 months ago

My inclination is just do 1