titicaca / spark-iforest

Isolation Forest on Spark
Apache License 2.0
227 stars 89 forks source link

Does the model training need labeled data? #16

Closed DanyYan closed 5 years ago

DanyYan commented 5 years ago

I find the example need to set the label column, so can't this model be used for unlabeled data?

titicaca commented 5 years ago

Label column is only for evaluation in the example codes. The algorithm is unsupervised, thus you don't need to set the label column for model fitting.

DanyYan commented 5 years ago

hello, I have another problem.If I use unlabeled data to train, how can I evaluate the model?

titicaca commented 5 years ago

You cannot evaluate the model without labeled data, the algorithm is unsupervised and only detects isolated data in its way, but you don't know which is the real isolated data without labels.

DanyYan commented 5 years ago

ok, so does the lable 1 means the isolated data?

titicaca commented 5 years ago

yes

DanyYan commented 5 years ago

ok,thank you.