openml / server-api

Python-based server
https://openml.github.io/server-api/
BSD 3-Clause "New" or "Revised" License
1 stars 1 forks source link

Proposal: change the way `data_processed` is use to determine if a dataset has been processed #122

Open PGijsbers opened 10 months ago

PGijsbers commented 10 months ago

The data_processed table is used in data/unprocessed/{data_engine_id}/{order} where the decision on whether or not a dataset has been processed by a data engine is determined by whether or not the data has been attempted to be processed 3 (process_data_tries) times. To me this is an odd decision. I would consider any dataset which has been attempted to be processed as processed. The data_engine_id itself should determine whether or not to try to process a dataset multiple times (provided it can access that information).

AND p.num_tries < ' . $this->config->item('process_data_tries') .

mysql> DESCRIBE data_processed;
+----------------------+--------------+------+-----+---------+-------+
| Field                | Type         | Null | Key | Default | Extra |
+----------------------+--------------+------+-----+---------+-------+
| did                  | int unsigned | NO   | PRI | NULL    |       |
| evaluation_engine_id | int          | NO   | PRI | NULL    |       |
| user_id              | int          | NO   |     | NULL    |       |
| processing_date      | datetime     | NO   |     | NULL    |       |
| error                | text         | YES  |     | NULL    |       |
| warning              | text         | YES  |     | NULL    |       |
| num_tries            | int          | NO   |     | 1       |       |
+----------------------+--------------+------+-----+---------+-------+