openml / benchmark-suites

7 stars 3 forks source link

Fix datasets #8

Closed mfeurer closed 6 years ago

mfeurer commented 6 years ago
mfeurer commented 6 years ago

Internet-Advertisements, ada_agnostic and sylva_agnostic are fine, but in_preparation. @giuseppec could you please ask Jann Goschenhofer to update those?

@joaquinvanschoren will work on fixing speed dating

@joaquinvanschoren should mfeat-pixel (1) be deactivated?

joaquinvanschoren commented 6 years ago

Internet-Advertisements, ada_agnostic and sylva_agnostic -> can these be made 'active'? Or is there anything else that needs to happen?

mfeurer commented 6 years ago

The can be made active, otherwise they appear to be fine.

joaquinvanschoren commented 6 years ago

Internet-Advertisements, ada_agnostic and sylva_agnostic all had new versions created by Jann. They are now appropriately (de)activated. ada_agnostic and sylva_agnostic and not in the benchmark because they are derived (from adult and covertype)

janvanrijn commented 6 years ago

As raised in @mfeurer 's email on march 14, speed dating is arguably broken. @mfeurer can you give some details on this? My Python parser opens it without problems and if I remember well we agreed that having typo's in the field attribute is part of a machine learners job to deal with.

mfeurer commented 6 years ago

If you compare the features with the textual description given by the reference you'll find the OpenML version of the datasets contains fields which are not described by that document, such as has_null.