Open PGijsbers opened 11 months ago
I chose for a direct reimplementation. While it is useful that tasks are modular and flexible, it is wildly inconvenient that the task_type_inout
table has to define templates to indicate what is relevant and what is not. I would need to have a closer look, but intuitively, we can separate the logic out:
task_type_inout
defines what is needed, which is useful for task creation and for input checks on run uploads.
task_inputs
can refer to the separate inputs specific to the task. Which fields are relevant (e.g., of the estimation procedure) should be clear from the estimation_procedure
table. Finally, which fields get forwarded to the user should then be encoded in the API itself (possibly through configuration files).
There may be reasons why that doesn't work, or there may be better ways, possibly with database changes. However, due to the time constraints and the need to verify against the old PHP API anyway, I have chosen to re-implement the logic for now and revisit it once I am more familiar with the entire database / server flow. Hopefully I can at least make the code itself easier to parse.
❓ How do we determine how to serialize empty values? E.g.,
{"name": "cost_matrix", "cost_matrix": []},
{"name": "evaluation_measures", "evaluation_measures": {"evaluation_measure": []}},
Why are these []
by default when they are missing? And not e.g., None.
❓ What is the point of "hidden" task_type_inout
entries? E.g., "custom_testset", "number_samples".
File can be private/public, maybe should not be on the entity.
❓ How do we determine how to serialize empty values? E.g.,
{"name": "cost_matrix", "cost_matrix": []}, {"name": "evaluation_measures", "evaluation_measures": {"evaluation_measure": []}},
Why are these
[]
by default when they are missing? And not e.g., None.
Those fields are forced to be present through the API. The other input types are optional so they are just omitted if not present.
task_type_inout
Template search can be deleted.
Not sure what "hidden"
requirements are for.
🪲 it looks like any data associated with private datasets (tasks->runs) are not considered private. I assume this is an oversight. If we would properly support private datasets, I think it would be a requirement to build in this kind of filtering in as leaks a lot of information of the private dataset itself (potentially class names, dataset size, dataset name) and likely associated activity on that dataset would have been assumed to be private as well.
Example: run 68 on task 103 is associated with private dataset 45.