online-ml / river

🌊 Online machine learning in Python
https://riverml.xyz
BSD 3-Clause "New" or "Revised" License
5.09k stars 552 forks source link

Perceptron predict_many as bool #1473

Open raul-parada opened 11 months ago

raul-parada commented 11 months ago

Hi,

Why my predict_many output is bool instead of a value using perceptron? Is it possible to change this behavior?

raphaelsty commented 11 months ago

Hi @raul-parada,

The perceptron has a ´predict_proba_many´ method in order to return probabilities over a batch of input samples. Is it what you are looking for?

As part of River, the perceptron is seen as a special case of logistic_regression, are you looking for at regressor which output continuous values? If you are interested in regressor then you need to switch to the MLPRegressor https://riverml.xyz/latest/api/neural-net/MLPRegressor/

raul-parada commented 11 months ago

Hi @raphaelsty,

I'm looking for direct prediction values. I've already tried MLPRegressor, however, I've faced some issues (due to the input). Find a reproducible example here https://github.com/online-ml/river/issues/1460

raphaelsty commented 11 months ago

Seems like it's a problem that's come from your input that does not fit River api.

When using learn_one and predict_one, your inpute features (x) should be formated as {feature_0: value_0, feature_1, value_1, .., feature_n: value_n}. Where each feature is a string and each value is a bool, an integer or a float. Your target value should be a single float (y)

If you attend to use learn_many or predict_many then your features (X) should be a pandas dataframe (of floats, bools or integers) and you target (y) values a pandas series.

If you share a sample of your training data (5 data points) and test data, I'll do my best to help you to fit the river API.

raul-parada commented 11 months ago

Thanks! Below X and y samples

import pandas as pd

data = {
    'latitude(m)': [41.391846, 41.391058, 41.391133, 41.389804, 41.389701],
    'longitude(m)': [2.162545, 2.163467, 2.163620, 2.166248, 2.165495]
}

X = pd.DataFrame(data, index=[234220, 234221, 234222, 234223, 234224])

y = pd.Series({
    234220: 14215279993461140785,
    234221: 14215279993622409866,
    234222: 14215279993718637872,
    234223: 14215280005449595738,
    234224: 14215280005025331239
})

I got this error: AttributeError: 'DataFrame' object has no attribute 'to_frame'

Appreciate your help