online-ml / river

🌊 Online machine learning in Python
https://riverml.xyz
BSD 3-Clause "New" or "Revised" License
5.03k stars 540 forks source link

Space Saving, HyperLogLog and Hierarchical Heavy Hitters algorithms #1559

Open laraabastoss opened 3 months ago

laraabastoss commented 3 months ago

Added coded and respective documentation for the Space Saving, HyperLogLog and Hierarchical Heavy Hitters algorithms within the sketch section.

smastelini commented 3 months ago

Hi @laraabastoss, thanks for your contribution! Recently some errors were fixed in the automated tests, so I am re-running them for this PR. Let's see how that goes and if you need to change something in your code. Perhaps you will need to pull the latest changes from the main branch.

Aside from that, I wanted to discuss a scope question. River already has a Heavy Hitters algorithm that is bound to provide the same functionality as Space Saving. I noticed that the current implementation in River supports a fading factor. I do not know the pros and cons of Space Saving vs Lossy Count with Forgetting Factor (the core of River's version), but I think we could do some renaming to keep both versions.

The idea is to follow the convention we followed so far for the stuff in river.sketch: