jeff1evesque / machine-learning

Web-interface + rest API for classification and regression (https://jeff1evesque.github.io/machine-learning.docs)
Other
258 stars 85 forks source link

Investigate 'boosting' implementation #3043

Open jeff1evesque opened 7 years ago

jeff1evesque commented 7 years ago

We need to investigate the requirements to streamline boosting implementation

jeff1evesque commented 7 years ago

The following are specific adaboost implementations:

protojas commented 7 years ago

AdaBoost is a way of creating classifiers for an ensemble.

The AdaBoost method takes a basic classifier, and attempts to fit the data to it. When the algorithm fails to classify something (say, a point called M), AdaBoost, on the next iteration, adds more of the failure point M.

So, when training the second, time, the dataset may have twice as many of the point "M". This leads to a higher chance that M is classified correctly in this classifier.

The algorithm does this N times to create N classifiers. These classifiers are then used in a typical ensemble classifier with a weighted majority vote to decide the outcome.

What this means is that classifier 1 may have no boosting, 2 will have the failures of classifier 1 boosted, 3 will have the failures of classifier 2 boosted, and so on, til classifier N, which will have the failures of classifier N-1 boosted.

The algorithm adapts to its failures, which is good in some cases, and detrimental in others. For example, in datasets with major outliers, the algorithm will significantly adapt to these outliers, which may skew its classification of normal data points.