The binarized version of pm10 has as target binaryClass, which seems to be a discretization of day from the version 1 of the dataset. The target, however, is pm10_concentration and trying to predict day seem a bit ... strange?
At least that should be mentioned.
Btw, scikit-learn currently provides the first active version of a dataset by default, while openml.org seems to show the latest version of the dataset. I'm not sure if there's currently a way to show what versions are available for a given dataset.
ping @joaquinvanschoren who seems to have uploaded the binarized dataset.
It looks to me like the data was collected over two years (only in winter I think as there's a gap in days), and in the binarized dataset the task is to figure out whether the measurement is from the first or the second year of data collection. I don't think that was the intention of the dataset creator.
The binarized version of pm10 has as target
binaryClass
, which seems to be a discretization ofday
from the version 1 of the dataset. The target, however, ispm10_concentration
and trying to predictday
seem a bit ... strange? At least that should be mentioned.Btw, scikit-learn currently provides the first active version of a dataset by default, while
openml.org
seems to show the latest version of the dataset. I'm not sure if there's currently a way to show what versions are available for a given dataset.ping @joaquinvanschoren who seems to have uploaded the binarized dataset.
It looks to me like the data was collected over two years (only in winter I think as there's a gap in days), and in the binarized dataset the task is to figure out whether the measurement is from the first or the second year of data collection. I don't think that was the intention of the dataset creator.