openml / server-api

Python-based server
https://openml.github.io/server-api/
BSD 3-Clause "New" or "Revised" License
1 stars 1 forks source link

Proposal: change `data_processed` table to record every processing attempt. #123

Open PGijsbers opened 10 months ago

PGijsbers commented 10 months ago

The current design of data_processed has a primary key on did, evaluation_id, so one particular row is updated every time a dataset is attempted to be processed. In particular, the num_tries records the history.

mysql> DESCRIBE data_processed;
+----------------------+--------------+------+-----+---------+-------+
| Field                | Type         | Null | Key | Default | Extra |
+----------------------+--------------+------+-----+---------+-------+
| did                  | int unsigned | NO   | PRI | NULL    |       |
| evaluation_engine_id | int          | NO   | PRI | NULL    |       |
| user_id              | int          | NO   |     | NULL    |       |
| processing_date      | datetime     | NO   |     | NULL    |       |
| error                | text         | YES  |     | NULL    |       |
| warning              | text         | YES  |     | NULL    |       |
| num_tries            | int          | NO   |     | 1       |       |
+----------------------+--------------+------+-----+---------+-------+

I would rather record each processing attempt. The table would need a new primary key (a sequential identifier is fine), and num_tries then becomes derivative (count the matching (did, evaluation_id) rows). This would provide a history of the errors and when attempts have been made.