openml / benchmark-suites

7 stars 3 forks source link

delete associated "goal" perf measures for all tasks in suite #12

Closed berndbischl closed 6 years ago

berndbischl commented 6 years ago

why agreed as a default that a task should have no measures

this should then be true for all tasks in our suite but it isnt

janvanrijn commented 6 years ago

I don't want to make this an OpenML discussion, but wouldn't it be a good idea to do this OpenML-wide?

I mean, up to this moment the 'goal performance measures' doesn't add anything to the current functionality, it isn't really used, it spreads results over various tasks and adds complexity to the task verification process.

joaquinvanschoren commented 6 years ago

I removed the evaluation measure from all tasks tagged 'OpenML100'

List of task_id's: 3,6,11,12,14,15,16,18,20,21,22,23,24,28,29,31,32,36,37,41,43,45,49,53,58,219,2074,2079,3021,3022,3481,3485,3492,3493,3494,3510,3512,3543,3549,3560,3561,3567,3573,3889,3891,3896,3899,3902,3903,3904,3913,3917,3918,3946,3948,3954,7592,9914,9946,9950,9952,9954,9955,9956,9957,9960,9964,9967,9968,9970,9971,9976,9977,9978,9979,9980,9981,9983,9985,9986,10093,10101,14964,14965,14966,14967,14968,14969,14970,34536,34537,34538,34539,125920,125921,125922,146195,146606,146607

You may want to delete your task cache.

joaquinvanschoren commented 6 years ago

Creating tasks without evaluation measures already is the default when creating a task.

The problem is that we've (historically) created lots of datasets with an 'arbitrary' measure (typically predictive accuracy) that we now want to remove because they are not really necessary. I.e. at this moment nobody is saying that we MUST have a task with predictive accuracy as the measure.

joaquinvanschoren commented 6 years ago

I would keep the functionality but make it clear that it should only be used when necessary. Use cases are domain-specific data that need specific evaluation measures, challenges,...

janvanrijn commented 6 years ago

Let's make a separate issue on this in the open tracker to discuss the future of this feature.

Op woensdag 17 januari 2018 heeft Joaquin Vanschoren < notifications@github.com> het volgende geschreven:

I would keep the functionality but make it clear that it should only be used when necessary. Use cases are domain-specific data that need specific evaluation measures, challenges,...

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/openml/OpenMLFirstBenchmarkSuite/issues/12#issuecomment-358150696, or mute the thread https://github.com/notifications/unsubscribe-auth/ACL7-u0No1tpSJFbUZZlTfF9jlLRocGmks5tLTqQgaJpZM4QZ0ej .

janvanrijn commented 6 years ago

This is done right? The conditional set of tasks does not have this.

janvanrijn commented 6 years ago

According to @giuseppec this is not properly the case. I will reopen

janvanrijn commented 6 years ago

FFR, this query reveals the following tasks SELECT t.task_id, d.name, m.value FROM dataset d, task_inputs t LEFT JOIN task_inputs m ON t.task_id = m.task_id AND m.input = "evaluation_measures" WHERE t.value = d.did AND t.input = "source_data" AND t.task_id IN (SELECT id FROM task_tag WHERE tag = "study_99") LIMIT 100

to have an evaluation measure: 125966, 146197, 146227

janvanrijn commented 6 years ago

For speeddating (125966) we need task 146607 For dna (146197) and churn (146227) I created a new tasks (167140, 167141)

Added and removed