Closed ryanbressler closed 10 years ago
I implemented simple balanced bagging with the -ballance option but haven't had time to test it heavily. It is implemented as:
build list of samples per category loop nSamples times draw a category draw a sample from that category
Largely in this file: https://github.com/ryanbressler/CloudForest/blob/master/sampeling.go#L7
We have a few diffrent methods now.
The plan is to implement balanced sampling of cases with replacement at the bagging level as follows:
Sample which class to draw from (uniform distribution to ensure balance on average). Draw a case from that class with replacement. Repeat.
We already have cost weighted classification. Please comment or open issues with other strategies.
References: http://statistics.berkeley.edu/sites/default/files/tech-reports/666.pdf http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3163175/ http://www.biomedcentral.com/1471-2105/11/523 http://bib.oxfordjournals.org/content/early/2012/03/08/bib.bbs006 http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0067863