openml / benchmark-suites

7 stars 3 forks source link

Make it easier for user to create benchmark suites #16

Closed giuseppec closed 6 years ago

giuseppec commented 6 years ago

1) We need to describe how to create a suite better. Currently, we have this https://www.openml.org/guide/benchmark . Maybe we can add something like this (still improvable):

a) To create a benchmark suite, we need to use tasks (not datasets). That is, if there is no task for the corresponding dataset, you have to first create a task out of it (see https://www.openml.org/new/task which is currently only possible through the web interface). b) You have to create a study https://www.openml.org/new/study (I think this is currently also only possible through the web interface) and remember the study ID after you have created the study, you will need the ID for step c). If you set an alias-string when creating the study, it can then also be used to retrieve the benchmark suite (alternatively the study ID can be used, see step d). c) You should add a tag called "study_X" where X = your study ID to the tasks (and datasets), this should be possible by the clients (e.g. R) or through web interface. d) Now you have your benchmark suite. In R, you can get the information using getOMLStudy(IDofStudy) or getOMLStudy("your-alias-string"). Study information can be found online https://www.openml.org/s/IDofStudy

  1. We maybe have to simplify some steps for users. Still, many things are only possible through the web interface. Example: a) we need a better way to create tasks out of datasets, imagine you want to add your one benchmark suite but have to create 100 tasks manually through the web. See https://github.com/openml/OpenML/issues/325 b) If a task is tagged, also the underlying data should be tagged. Also if a run is tagged, then the underlying task, data and flow should be tagged by the same tag. See https://github.com/openml/OpenML/issues/530 . If the server does not do this automatically, at least the client should do this. c) Maybe we should also allow that tagging tasks by alias-string are also accepted in step (c) above.
giuseppec commented 6 years ago

Maybe we can also extend https://www.openml.org/new/study such that users can directly add the ids for the tasks and datasets when they create the study. We can also think of adding an API that allows creating a study and directly providing the IDs for tasks, datasets, ... through the client. This then automatically tags all tasks, datasets, ... with the ALIAS and the study_ID.

giuseppec commented 6 years ago

And: Maybe it should also be possible to create a benchmark suite for already existing tags. For example, if I would like to create a benchmark suite that contains only UCI tasks/data I currently can not do this by simply defining a new study that automatically adds all tasks having the UCI tag.

giuseppec commented 6 years ago

We need to make it easy for the reader to find a tutorial "how to create own benchmark suites" in the paper if we are not explaining it in the paper. We also discussed this issue here https://github.com/openml/OpenMLFirstBenchmarkSuite/issues/5#issuecomment-367999155 but have still no solution I guess.

mfeurer commented 6 years ago

Moved to https://github.com/openml/OpenML/issues/663