teejlab / API-Risk-Assessment-Framework

A framework for quantifying API risks.
https://teejlab.github.io/API-Risk-Assessment-Framework/intro.html
MIT License
5 stars 9 forks source link

Data Augmentation #40

Open Anupriya-Sri opened 2 years ago

Anupriya-Sri commented 2 years ago

We currently have less data for training Machine Learning models. I suggest that we try these approaches and compare the performance:

  1. Bootstrapping to augment the data, and training models on this higher data volume
  2. Use Bagging Classifier as the ensemble model,, which fits base model on the bootstrapped samples and combines the results: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingClassifier.html

This thread has been opened for discussing on data augmentation techniques.

Jacq4nn commented 2 years ago

"bagging methods work best with strong and complex models (e.g., fully developed decision trees), in contrast with boosting methods which usually work best with weak models (e.g., shallow decision trees)"

I'm not entirely sure which model our dataset will fall under.

Also, this will affect the interpretability of the results. I'm not sure if this is the optimal approach