openml / OpenML

Open Machine Learning
https://openml.org
BSD 3-Clause "New" or "Revised" License
664 stars 90 forks source link

Inconsistency in dataset columns #768

Open janvanrijn opened 6 years ago

janvanrijn commented 6 years ago

Several columns, such as creator and contributor, have the option to contain multiple values. These are stored in json/csv format. However the column default_target_type is stored in plain csv. Should we convert this to json (so tools can uniformly read them with native json libraries)

relevant for fetch_openml fn in sklean.

joaquinvanschoren commented 6 years ago

I'd be fine with that.

On Sat, 21 Jul 2018 at 22:51 janvanrijn notifications@github.com wrote:

Several columns, such as creator and contributor, have the option to contain multiple values. These are stored in json/csv format. However the column default_target_type is stored in plain csv. Should we convert this to json (so tools can uniformly read them with native json libraries)

relevant for fetch_openml fn in sklean.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/openml/OpenML/issues/768, or mute the thread https://github.com/notifications/unsubscribe-auth/ABpQVxEv8zr3sF0f90W6sRKdDPfB5wciks5uI5RdgaJpZM4VZwk_ .

-- Thank you, Joaquin

amueller commented 5 years ago

has this been fixed?

janvanrijn commented 5 years ago

I think I recently opened a duplicate issue for this, i.e., #799

No. This is an update that requires a change in the API definition, therefore I also need the agreements of @giuseppec (and also @mfeurer, but I can check the state of the python code myself, of course)

I think that the Python connector does not offer the option to create multitask datasets yet anyway, but again I will check this. Java connector has very limited multitask dataset support, so there it is fine.

@giuseppe what do you think?

amueller commented 5 years ago

@mfeurer reacted with a thumbs up ;)

giuseppec commented 5 years ago

Not sure if I understand this (and its consequences for the R client) 100% but go ahead with an update to the test server. I'll understand the consequences after I see what the R package does.