We want to have a balance between speed and accuracy/resilience to noise.
Instead of C4.5 we can use either the Decision Tree (faster but not a resilient as C.4.5), or Random Forst (slower but way more accurate)
We can keep using 3-NN but the spark-knn version as that one is O(n*logn) even though it's not exact
We do not have the LOG classifier, so we can use another linear method such as the Multinomial Logistic Regression. An alternative might be using SVM (note that it looks like it can only handle binary problems, hence we might consider using One-vs-Rest along with SVM)
The original paper uses the following classifiers:
Unfortunately, the ML lib does not have those methods, so we will need to search for an alrenative.
Ideally, we want to use classifiers that are robust to noise and complement each other.
The ML Lib API documentation can be fount here: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/classification/index.html
Further documentation including examples can be found here: https://spark.apache.org/docs/latest/ml-classification-regression.html