openml / OpenML

Open Machine Learning
https://openml.org
BSD 3-Clause "New" or "Revised" License
663 stars 90 forks source link

Creating a new challenge on the test server results in an SQL error #752

Open mfeurer opened 6 years ago

mfeurer commented 6 years ago

A Database Error Occurred

Error Number: 1064

You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'BY `source_data`' at line 1

SELECT `d1`.`did` AS source_data, `d2`.`did` AS `source_data_labeled`, `f1`.`name` AS `target_feature` FROM `dataset` `d1`, `dataset` `d2`, `data_feature` `f1`, `data_feature` `f2` WHERE `d1`.`did` = `f1`.`did` AND `d2`.`did` = `f2`.`did` AND `f1`.`name` = "class" AND `f2`.`name` = "class" AND `f2`.`NumberOfMissingValues` = 0 AND `f1`.`data_type` IN ("nominal") AND `f2`.`data_type` IN ("nominal") AND ((`d1`.`name` = "diabetes" AND `d1`.`version` = "1") ) AND 1ORDER BY `source_data`;

Filename: core/MY_Database_Read_Model.php

Line Number: 31
janvanrijn commented 6 years ago

I assume you used the API to create the task? Can you also post the XML that you uploaded?

mfeurer commented 6 years ago

Nope, used the website. Dataset was diabetes and metric predictive_accuracy.

janvanrijn commented 6 years ago

Can you send me the dataset ids / target feature / etc and I will see if the API also has problems with this one.

mfeurer commented 6 years ago

20 / class / predictive_accuracy

The field Dataset (labelled) which might to be required is not given.

janvanrijn commented 6 years ago

I can confirm the following things:

a) dataset labeled is indeed a required field b) The api will correctly handle the error if this one is missing c) Creating this task type was not unit tested. Now it is, and it works. d) Unfortunately this task type is not greatly documented, but I'm happy to help e) (I don't think the python interface can handle this task type right now)

janvanrijn commented 6 years ago

@joaquinvanschoren can you have a look at what's going wrong at the webinterface? I strongly suggest we remove the task creation form, as I don't trust the underlying code ..

mfeurer commented 6 years ago

Thanks @janvanrijn

dataset labeled is indeed a required field

This should be documented and a required field. Currently it can be left blank. Also, I don't know what to put in there.

(I don't think the python interface can handle this task type right now)

I don't think so either. I was also wondering what the python API should do for the challenge task, is it regression, classification or something completely different? As long as this is not specified I wouldn't know how to implement such a task for the API (and having a target metric to optimize is not helping here as the metrics don't declare a task type).

joaquinvanschoren commented 6 years ago

Hmm, the current form calls 'Task->create_batch' instead of the API. I agree that it should go through the API.

I'm still hesitant to remove the form altogether even though it may have some bugs. I'd rather focus on rewriting it as soon as possible and redirecting all requests through the API. I should have time to work on this starting next week. -- Thank you, Joaquin

janvanrijn commented 6 years ago

Usually I wouldn't care so much about little things breaking on the frontend, but this function has access to the database, compromising the integrity of the data.

I'd rather focus on rewriting it as soon as possible and redirecting all requests through the API. I should have time to work on this starting next week.

We discussed replacing this form before, but (due to our time schedules / constraints) unfortunately it hasn't happened yet. Can we at least put a deadline on removing the old functionality? E.g., when it's still not fixed at the end of next week, let's remove the form in order to guarantee data integrity.

janvanrijn commented 6 years ago

This should be documented and a required field.

Oddly, in the database the field is flagged as required.