openml / benchmark-suites

7 stars 3 forks source link

segment dataset #19

Closed joaquinvanschoren closed 6 years ago

joaquinvanschoren commented 6 years ago

This new version was created by Jann: https://www.openml.org/d/40984

The difference is that the position of the 3x3 pixel sample in the image is removed. Are we sure that this is correct? If I want to classify 'sky', is it not useful to know the position in the image?

I'm leaving both versions as active for now.

mfeurer commented 6 years ago

That's correct, I think I was too hasty in calling for removing these two features. However, if you leave these in the data, we need a different strategy for shuffling this dataset because you wouldn't get a random subset of classified labels (pixels of an image) with the task of segmenting the rest.

Moreover, what's the actual source of the data? I couldn't find any paper introducing it, only the Statlog book has some simple description on it.