uci-ml-repo / ucimlrepo

Python package for dataset imports from UCI ML Repository
MIT License
216 stars 90 forks source link

Wine Quality dataset(s) import problems #5

Closed zarkoivkovicc closed 10 months ago

zarkoivkovicc commented 12 months ago

This might be a stupid question, but the Wine Quality dataset (id=186) is supposed to have two datasets. You can manually download both of them, but when you fetch them using this python package, only white wine dataset is fetched. Am I missing something obvious or this is a bug/feature?

ptruong0 commented 11 months ago

Ah I see, how would you prefer this issue to be resolved? We could either combine the data for both red and white wine into one dataset when importing the dataset in Python, or split it into two datasets based on wine type (two different dataset pages on the website)?

zarkoivkovicc commented 11 months ago

In my opinion, the fetch command could return dictionary, where the keys are 'red' and 'white' and values are the corresponding datasets.

ptruong0 commented 11 months ago

The Python package should always return an object with the same structure no matter the dataset. It would be confusing for just one dataset to stray from the standard format. In other words, it would not be good practice to implement edge cases for certain datasets.

ptruong0 commented 11 months ago

I think a good solution would be to combine them into one data table and add an additional column for "red" or "white". Then you can choose to use them together or easily separate them using pandas. e.g. df.loc[df['color'] == 'red']

zarkoivkovicc commented 11 months ago

For me it doesn't matter, it's either that or having two separate datasets that can be fetched separately.