online-ml / river

🌊 Online machine learning in Python
https://riverml.xyz
BSD 3-Clause "New" or "Revised" License
4.89k stars 538 forks source link

How to use river my own dataset? #1514

Closed kirathes closed 4 months ago

kirathes commented 4 months ago

Would you please show me how to use river on my own dataset? How to prepare my dataset to have the same data structure with the example datasets embedded in river?

MarcoDiFrancesco commented 4 months ago

It should be pretty straight forward with the example in the readme, it's enough to have your features in a dictionary as input and output as a constant.

>>> from pprint import pprint
>>> from river import datasets

>>> dataset = MYDATASET

>>> for x, y in dataset:
...     pprint(x)
...     print(y)
...     break
{'age_of_domain': 1,
 'anchor_from_other_domain': 0.0,
 'empty_server_form_handler': 0.0,
 'https': 0.0,
 'ip_in_url': 1,
 'is_popular': 0.5,
 'long_url': 1.0,
 'popup_window': 0.0,
 'request_from_other_domain': 0.0}
True
gbolmier commented 4 months ago

@kirathes see also the river.stream module with handy utils like iter_pandas.