Closed joaquinvanschoren closed 6 years ago
That's correct, I think I was too hasty in calling for removing these two features. However, if you leave these in the data, we need a different strategy for shuffling this dataset because you wouldn't get a random subset of classified labels (pixels of an image) with the task of segmenting the rest.
Moreover, what's the actual source of the data? I couldn't find any paper introducing it, only the Statlog book has some simple description on it.
This new version was created by Jann: https://www.openml.org/d/40984
The difference is that the position of the 3x3 pixel sample in the image is removed. Are we sure that this is correct? If I want to classify 'sky', is it not useful to know the position in the image?
I'm leaving both versions as active for now.