twang15 / PlatoAcademy

Free thoughts live
2 stars 1 forks source link

### Todos #25

Open twang15 opened 3 years ago

twang15 commented 3 years ago
  1. Feature selection via exhaustive search.
  2. Estimate search time
  3. Try several other linear (svm) and non-linear (random forest, extra tree, gradientboost, xgboost) model
  4. Model interpretation for best linear model (via statistics, hypothesis testing, and LIMA, Shapley value)
  5. Metrics: auc, accuracy, sensitivity, specificity, ppv
  6. Model selection via nested cv
  7. Model comparison in terms of auc (p-value), accuracy, speed
twang15 commented 3 years ago

06/05/2021

  1. Impact of normalization on XGBoost
  2. If XGBoost or other non-linear model is no better, what to do?
    • report statistics for several non-linear models (more is better than fewer)
    • explain the best non-linear model for more insights than merely explaining logistics regression
  3. SVM, Random Forest and model selection harness
  4. Ensemble/stacking for XGBoost
twang15 commented 3 years ago

06/20/2021

  1. Experiments show that stacking brings little benefit.
    • Decide to not use stacking/voting
Logit Xgboost SVM    
0.87421 0.89258 0.8849 16 ['age', 'rRR', 'rLen', 'rPTLA', 'lPSA', 'lRR', 'rThick', 'lSPA', 'rPSPA', 'DLK', 'weight', 'rKUPE', 'rPTSA', 'height', 'lPT', 'lThick']
0.87547 0.86283 0.87604 7 ['lRP', 'rRP', 'age', 'lTSPA', 'rKUPE', 'weight', 'DLK']
0.87054 0.84427 0.87127 5 ['lRP', 'rRP', 'age', 'lTSPA', 'DLK']
0.86539 0.83439 0.86752 4 ['lRP', 'rRP', 'age', 'lTSPA']
0.8576 0.80461 0.85891 3 ['lTSPA', 'rRP', 'lRP']
0.84638 0.79191 0.84633 2 ['lRP', 'rRP']
0.82701 0.76416 0.82701 1 ['rRP']
  1. Performance:

    • XGBoost, AUC=89.3 %
    • Learning curve: overfitting?
  2. Explanations

    • Model-level v.s instance-level
    • feature importance (Logit): statistical significance, coefficients,
    • Decision process (Decision Tree)
    • Shapley values
twang15 commented 3 years ago

['rRP', 'lRP', 'lAR', 'lPLA', 'age', 'DLK', 'lThick', 'LE', 'rShort', 'rRR', 'rPTLPA', 'lSPA']

twang15 commented 2 years ago

07/28/2021

  1. Experiments:
    • A Sensitivity analysis of training set size to prediction variance is recommended to find the point of diminishing returns.