openml / openml-r

R package to interface with OpenML
http://openml.github.io/openml-r/
Other
95 stars 37 forks source link

listOMLTasks of OpenML100 shows too many tasks #407

Closed PhilippPro closed 6 years ago

PhilippPro commented 6 years ago

Nearly every task is available two times, if I want to display it. Why nearly every task appears twice?

>  tasks = listOMLTasks(number.of.classes = 2L, number.of.missing.values = 0, 
    data.tag = "OpenML100", estimation.procedure = "10-fold Crossvalidation")

> tasks[tasks$name %in% "nomao",]
    task.id                 task.type data.id  name status format    estimation.procedure target.feature evaluation.measures cost.matrix quality.measure
111    9977 Supervised Classification    1486 nomao active   ARFF 10-fold Crossvalidation          Class                <NA>        <NA>            <NA>
555  145854 Supervised Classification    1486 nomao active   ARFF 10-fold Crossvalidation          Class                <NA>        <NA>            <NA>
    majority.class.size max.nominal.att.distinct.values minority.class.size number.of.classes number.of.features number.of.instances
111               24621                               3                9844                 2                119               34465
555               24621                               3                9844                 2                119               34465
    number.of.instances.with.missing.values number.of.missing.values number.of.numeric.features number.of.symbolic.features
111                                       0                        0                         89                          30
555                                       0                        0                         89                          30

> table(tasks$name)

                    ada_agnostic           Amazon_employee_access                       Australian                   bank-marketing 
                               1                                2                                2                                2 
         banknote-authentication                      Bioresponse blood-transfusion-service-center           Click_prediction_small 
                               2                                3                                2                                2 
climate-model-simulation-crashes                         credit-g                         diabetes                    eeg-eye-state 
                               2                                5                                2                                2 
                     electricity                    gina_agnostic                      hill-valley                             ilpd 
                               2                                1                                2                                2 
         Internet-Advertisements                              kc1                              kc2                         kr-vs-kp 
                               3                                1                                1                                2 
                         madelon                   MagicTelescope                 monks-problems-1                 monks-problems-2 
                               2                                1                                2                                2 
                monks-problems-3                         mozilla4                            nomao                  ozone-level-8hr 
                               2                                1                                2                                2 
                             pc1                              pc3                              pc4                 PhishingWebsites 
                               1                                1                                1                                2 
                         phoneme                      qsar-biodeg                            scene                         spambase 
                               2                                2                                1                                2 
              steel-plates-fault                   sylva_agnostic                      tic-tac-toe                             wdbc 
                               2                                1                                2                                2 
                            wilt 
                               2 
joaquinvanschoren commented 6 years ago

You probably should replace 'data.tag' with 'tag' that gives you all tasks with that tag instead of all tasks on a dataset with that tag.

PhilippPro commented 6 years ago

You are right. Thanks!