Choose Classifiers - Githubissues

salvacorts / Spark-INFFC

🐘 Noise filtering method for Spark based on the fusion of classifiers and noise score metrics

Apache License 2.0

0 stars 0 forks source link

Choose Classifiers #1

Closed salvacorts closed 2 years ago

salvacorts commented 3 years ago

The original paper uses the following classifiers:

C4.5
3-NN
LOG

Unfortunately, the ML lib does not have those methods, so we will need to search for an alrenative.

Ideally, we want to use classifiers that are robust to noise and complement each other.

The ML Lib API documentation can be fount here: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/classification/index.html

Further documentation including examples can be found here: https://spark.apache.org/docs/latest/ml-classification-regression.html

salvacorts commented 3 years ago

We want to have a balance between speed and accuracy/resilience to noise.

Instead of C4.5 we can use either the Decision Tree (faster but not a resilient as C.4.5), or Random Forst (slower but way more accurate)
We can keep using 3-NN but the spark-knn version as that one is O(n*logn) even though it's not exact
We do not have the LOG classifier, so we can use another linear method such as the Multinomial Logistic Regression. An alternative might be using SVM (note that it looks like it can only handle binary problems, hence we might consider using One-vs-Rest along with SVM)

Other classifiers we may try are: