online-ml / river

🌊 Online machine learning in Python
https://riverml.xyz
BSD 3-Clause "New" or "Revised" License
5.04k stars 541 forks source link

PageHinkley only detects increases in mean #810

Closed yuritpinheiro closed 2 years ago

yuritpinheiro commented 2 years ago

Describe the bug

PageHinkley Concept Drift Detector is unable to detect concept drift for a statistical mean which decreases in value.

Steps/code to reproduce

import numpy as np
from river.drift import PageHinkley

import matplotlib.pyplot as plt

data = np.random.normal((0, 5, 0), size=(1000, 3)).T.reshape(-1)

cdd = PageHinkley()
sum_list = []
detect = []
for i, x in enumerate(data):
    cdd.update(x)
    sum_list.append(cdd.sum)
    if cdd.change_detected:
        detect.append(i)

data_plot = plt.plot(data, label="Data samples")
ax2 = plt.gca().twinx()
sum_plot = ax2.plot(sum_list, color='orange', label="PHT sum")
for d in detect:
    detect_plot = plt.axvline(d, c="red")

plt.legend([data_plot[0], sum_plot[0], detect_plot], ["Data samples", "PHT sum", "CD detection"])
plt.show()

image

MaxHalford commented 2 years ago

Hello there! I'm not familiar with this specific part of the code base. I'll let @jacobmontiel or maybe @smastelini take a look when they can.

smastelini commented 2 years ago

I'm not familiar with this portion of the codebase, but I can give it a try if @jacobmontiel is unavailable. I believe he coded PageHinkley.

jacobmontiel commented 2 years ago

Hi @yuritpinheiro

This is the expected behavior. Page-Hinkley (and CUSUM) is a one-sided drift detector, it only indicates changes when the mean increases.

You can easily extend the existing implementation to create a two-sided detector by symmetry or use a two-sided detector such as HDDM or ADWIN.

yuritpinheiro commented 2 years ago

Thank you @jacobmontiel.