online-ml / river

🌊 Online machine learning in Python
https://riverml.xyz
BSD 3-Clause "New" or "Revised" License
5.1k stars 551 forks source link

standard scaler in a pipeline with CluStream returns zero values #1616

Open Gelso22 opened 2 months ago

Gelso22 commented 2 months ago

river version: 0.21.1 Python version: 3.12.3 Operating system: windows

Hello,

I dont' understand where to put the function degub_one to inspect the pipeline respect to loop of training? as your example here . Do I have to use it inside a loop with learn_one(x) funtion?

And also I am not su sure why I obtain Inf values of silhoutte metric with the function progressive_val_score

Data are attached below.

from river import stream
from river import compose
from river import metrics
from river import evaluate

dataset_stock = stream.iter_csv('data.csv',  drop=['Date'], converters=diz_conversion)
 # diz onversione just conver all numeric cols in float 

for x, _ in dataset_stock:
    print(x)
# get the last line of data 

from river import cluster
from river import preprocessing

clustream = cluster.CluStream(
    n_macro_clusters=3,
    max_micro_clusters=50, time_window=20,
    seed=0
)
model = compose.Pipeline(
    preprocessing.StandardScaler(),
    clustream
)
clu_metric = metrics.Silhouette()
print(model.debug_one(x))  # here I get all values equals 0 after rescaling them. 

evaluate.progressive_val_score(dataset_stock, model, clu_metric, print_every=50) # here I get inf values of silhoutte

Thanks!

Gelso22 commented 2 months ago

data.csv