Improve performance for SparseVector

linkedin / isolation-forest

A distributed Spark/Scala implementation of the isolation forest algorithm for unsupervised outlier detection, featuring support for scalable training and ONNX export for seamless cross-platform inference.

Other

229 stars 47 forks source link

@eisber: Thanks for creating this PR!

We had a similar discussion internally (DataPoint case class vs. Array[Array[Float]]) back when I was creating the library. We decided on the DataPoint case class as it is more readable and the difference in memory usage was negligible (for our use cases DataPoint case class used ~6% more memory than Array[Array[Float]]).

Are there any data / benchmarks to indicate that there is a major performance improvement using Vector instead of the DataPoint case class?

linkedin / isolation-forest

Improve performance for SparseVector #10