I think ensemble learning and active learning would give us more direct impact than Spark/MlLib.
In summary.
Ensemble of classifiers is likely to produce a lower generalization error than a single classifier.
Active learning (based on ensemble of classifiers) makes classifier able to point out the most 'informative' unlabeled data, and we can use active learning to instruct the labeling team what data to label.
This research topic hence could be decomposed into 1. #488 Ensemble learning. 2. #489 Active learing.
Check these out. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.52.9672&rep=rep1&type=pdf https://www.jair.org/media/295/live-295-1554-jair.pdf
I think ensemble learning and active learning would give us more direct impact than Spark/MlLib.
In summary. Ensemble of classifiers is likely to produce a lower generalization error than a single classifier. Active learning (based on ensemble of classifiers) makes classifier able to point out the most 'informative' unlabeled data, and we can use active learning to instruct the labeling team what data to label.
This research topic hence could be decomposed into 1. #488 Ensemble learning. 2. #489 Active learing.