Closed BayanIbra closed 5 years ago
Hi! No this is a different thing. You can get the datasets of uci by
a = listOMLDataSets(tag = "uci")
You can give tags when you run experiments. E.g. you can get results that you uploaded, when you gave a tag when uploading the experiment.
Here you can see an example: https://github.com/ja-thomas/OMLbots/blob/master/HowToWriteABot.Rmd
You can get e.g. some runs, that I uploaded:
my_runs = listOMLRunEvaluations(tag = "mysimpleBot")
Ok, it is very inconvenient but you could do this:
# get all uci data sets
ds = listOMLDataSets(tag = "uci")
# get all classification tasks where 10-fold CV is used to estimate the performance
tasks = listOMLTasks(task.type = "Supervised Classification", estimation.procedure = "10-fold Crossvalidation")
# subset those tasks so that you only have tasks based on uci data sets
tasks = tasks[tasks$data.id %in% ds$data.id, ]
# note that there can still be multiple tasks for each data set (you probably want only one)
table(tasks$name)
# get results using the task id (increase the total.limit to get more results)
res = chunkOMLlist("listOMLRunEvaluations",
task.id = tasks$task.id,
evaluation.measure = "predictive_accuracy",
total.limit = 100000)
I've already proposed a server change here https://github.com/openml/OpenML/issues/530
Ah you can do it better, you can use the data.tag
argument from listOMLTasks
tasks = listOMLTasks(data.tag = "uci", task.type = "Supervised Classification",
estimation.procedure = "10-fold Crossvalidation")
# note that there can still be multiple tasks for each data set (you probably want only one task per data)
table(tasks$name)
# get results using the task id (increase the total.limit to get more results)
res = chunkOMLlist("listOMLRunEvaluations",
task.id = tasks$task.id,
evaluation.measure = "predictive_accuracy",
total.limit = 100000)
That's great thank you both @PhilippPro , @giuseppec for your responses but what if I want the other evaluation measures not just predictive accuracy (such as area_under_roc_curve).
Can I get thisusing listOMLTasks?
Thanks again, Bayan
Ok, I think this requires some clarification (it is really a bit confusing). Here my attempt to explain this:
I wouldn't do this via listOMLTask
as you would then obtain less results if you pick the "wrong" task (see, e.g. table(tasks$evaluation.measures)
).
The evaluation.measure
stored in tasks can be seen as the "default measure" or "suggested measure" on which the performance of competing algorithms should be evaluated (this is something the guy who created the task just selected).
For example, look at the number of runs for the two tasks https://www.openml.org/t/2 and https://www.openml.org/t/145952. Both tasks are based on the anneal
data.
However, the first one uses the predictive_accuracy
and the second one uses the precision
.
In general, you could simply merge the runs of those two tasks, since OpenML computes all evaluation measures on all measures anyway (maybe both task use different train-test splits).
Anyway, here the answer to your question. I would suggest to do the following for each measrue you are interested in, separately (and merge the data frames afterwards):
# get results using the task id (increase the total.limit to get more results)
res.acc = chunkOMLlist("listOMLRunEvaluations",
task.id = tasks$task.id,
evaluation.measure = "predictive accuracy",
total.limit = 1000)
res.auc = chunkOMLlist("listOMLRunEvaluations",
task.id = tasks$task.id,
evaluation.measure = "area_under_roc_curve",
total.limit = 1000)
And then just join the res.acc
and res.auc
.
I guess this issue can be closed here. If not then just reopen.
Is the dataset tags in each dataset page represent the tags in this function listOMLRunEvaluations? I am trying to get results for tag = uci using listOMLRunEvaluations function but cannot return anything. Are the added tags in each dataset page in the website are updated in the function ?
please advice,