openml / openml-data

For tracking issues related to OpenML datasets
1 stars 1 forks source link

Dataset 40978: should have missing values. #52

Open mb706 opened 1 year ago

mb706 commented 1 year ago

Description of the dataset states (highlight is mine):

There are : 3 continuous attributes. The others are binary. This is the "STANDARD encoding" mentioned in the [Kushmerick, 99] (see below). One or more of the three continuous features are missing in 28% of the instances. Missing values should be interpreted as "unknown".

However, the dataset on OpenML does not have missing values (as seen in the "Qualities").

The original dataset as hosted by UCI has missing values indicated by "?". In the OpenML dataset, the corresponding cells are 0, instead.

Note the dataset is tagged as OpenML-CC18.