msimms / LibIsolationForest

C++, rust, julia, python2, and python3 implementations of the Isolation Forest anomaly detection algorithm.
MIT License
33 stars 12 forks source link

Julia Package #8

Open msimms opened 3 years ago

msimms commented 3 years ago

Would like to make this so it can be installed via the Julia package manager.

davnn commented 2 years ago

Hi Mike, would you be interested in porting the Julia version to our OutlierDetectionJL organization? There will be an OutlierDetectionTrees.jl package that would host the algorithm and it will be fully integrated with MLJ.

msimms commented 2 years ago

I'm curious. I'll take a look at what you've got.

davnn commented 2 years ago

I did a first try to implement it at OutlierDetectionTrees.jl. I changed the implementation a little bit to work without explicit feature names, check out IsolationForest.jl.

Only a very small wrapper is necessary to make it work with MLJ and OutlierDetection.jl, see models/IForest.jl.

A small usage example:

import Pkg

Pkg.activate(;temp=true)

Pkg.add("MLJ")
Pkg.add("OutlierDetection")
Pkg.add("OutlierDetectionData")
Pkg.develop(;url="https://github.com/OutlierDetectionJL/OutlierDetectionTrees.jl")

using MLJ
using OutlierDetection
using OutlierDetectionData
using OutlierDetectionTrees

X, y = ODDS.load("thyroid")
train, test = partition(eachindex(y), 0.5, stratify=y, shuffle=true)
detector = ProbabilisticDetector(IForestDetector())
mach = machine(detector, X)
fit!(mach, rows=train)
ŷ = predict(mach, rows=test)
auc(ŷ, y[test])

The results do not appear to be correct, yet. Once the bugs are fixed it should be ready to be registered with MLJ's model registry. What do you think?