mlr-org / mlr

Machine Learning in R
https://mlr.mlr-org.com
Other
1.64k stars 404 forks source link

supervised measurements for unsupervised problem #1703

Closed MinhAnhL closed 4 years ago

MinhAnhL commented 7 years ago

Hi, I'm new to MLR, but one of my task for my Masterthesis is to integrate a new task into MLR. New Task: one class classification aka novelty detection aka anomaly detection (= makeOneClassTask() ). Current learner: svm, ksvm, autoencoder

The current use case I deal with is: Classes: "normal" and "anomaly" For training: Only have data with the "normal" class, therefore I have a unsupervised learning problem. For testing: Have data with both classes "normal" and "anomaly" (highly unbalanced), therefore I need supervised measurements methods like true positive rate, ROC etc. (evaluation methods are settled for this Masterthesis, it has to be ROC) and therefore I need a truth variable in my prediction.

My question is rather a design-question: How should I make supervised measurements for unsupervised problem possible?

Some thoughts:

  1. Add target column to the training data with only "normal"-class and add "target" as inputvariable to the makeOneClassTask()-fct (although it is unsupervised)(target variable is not used for training). The testdata will have a labeled column (normal and anomaly) -> tpr, roc can be calculated (currently the case, but many hacks are needed in fct like "setThreshold()" or "generateThreshVsPerfData()")
  2. Writing new prediction methods which additionally can accept a truth column (on user supply). This prediction method should work for OneClass but probably also for Unsupervised-tasks in general (therefore also for clustering) (?).

So I'm some kind of between supervised and unsupervised problems and I'm not sure how to best integrate this use case into MLR without ugly hacks into base fcts.

Any help really appreciated :)

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.