kLabUM / rrcf

🌲 Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams
https://klabum.github.io/rrcf/
MIT License
488 stars 111 forks source link

clarify reproducibility using numpy.random.seed #79

Closed yasirroni closed 3 years ago

yasirroni commented 3 years ago

I made this change in README.md to clarify how to maintain reproducibility (might be useful for paper publication, hyper parameter optimization, and debugging).

Control tree random seed

Even with same data, a tree (also a forest) generated from rrcf.RCTree() is subject to np.random and might change for every run (resulting in different tree shape and anomaly score). To maintain reproducibility, use numpy.random.seed():

# Before making a tree or forest
seed_number = 42 # your_number
np.random.seed(seed_number)
tree = rrcf.RCTree(X)