Closed timodonnell closed 7 years ago
Long-term concerns:
Will the extensive work-splitting code in the new ensemble class also get used in some form by parallelism for training other model types? Also, will the measurement collection remain redundant with the affinity data set?
For now, just comment the crap out of it!
thanks for the review @iskandr , updated with a lot more documentation. Going to merge momentarily and cut a release
Closing in favor of #84
This PR adds support for ensembles of single-allele class1 predictors, trained on random halves of each allele's data. A downloadable set of ensembles with 16 models per ensemble is included, supporting 132 alleles (2112 models in total). Each model in an allele's ensemble was selected as the top-performing model (by sum of AUC, F1, and Tau) in model selection over 160 architectures. Imputation was considered a binary feature of the architecture; overall about half the models selected used imputation.
To test this out in the current branch you can run:
I'm leaving the existing single model predictors as the default right now. We can switch the default to ensembles once we have a mass-spec-based assessment of their quality, which should be soon.
There's a lot here @iskandr so it may make sense to go over in person sometime.