openml / benchmark-suites

7 stars 3 forks source link

Fix speeddating #30

Closed janvanrijn closed 6 years ago

janvanrijn commented 6 years ago

As raised by @mfeurer and discussed in the skype call this morning, the speed dating dataset should be fixed.

assigned myself for obvious reasons.

janvanrijn commented 6 years ago

The more I read in the Word document describing the features, the less I feel like being certain that this is an actual classification dataset. Some thoughts:

Abstract from "GENDER DIFFERENCES IN MATE SELECTION: EVIDENCE FROM A SPEED DATING EXPERIMENT"

We study dating behavior using data from a Speed Dating experiment where we generate random matching of subjects and create random variation in the number of potential partners. Our design allows us to directly observe individual decisions rather than just final matches. Women put greater weight on the intelligence and the race of partner, while men respond more to physical attrac- tiveness. Moreover, men do not value women’s intelligence or ambition when it exceeds their own. Also, we find that women exhibit a preference for men who grew up in affluent neighborhoods. Finally, male selectivity is invariant to group size, while female selectivity is strongly increasing in group size.

based on this it seems like this dataset has been used for analytical purposes rather than classification and I would propose to drop it.

mfeurer commented 6 years ago

I agree on your proposition to drop this dataset.