openml / server-api

Python-based server
https://openml.github.io/server-api/
BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

GET /task/{id} #22

Open PGijsbers opened 11 months ago

PGijsbers commented 9 months ago

🪲 it looks like any data associated with private datasets (tasks->runs) are not considered private. I assume this is an oversight. If we would properly support private datasets, I think it would be a requirement to build in this kind of filtering in as leaks a lot of information of the private dataset itself (potentially class names, dataset size, dataset name) and likely associated activity on that dataset would have been assumed to be private as well.

Example: run 68 on task 103 is associated with private dataset 45.

PGijsbers commented 9 months ago

I chose for a direct reimplementation. While it is useful that tasks are modular and flexible, it is wildly inconvenient that the task_type_inout table has to define templates to indicate what is relevant and what is not. I would need to have a closer look, but intuitively, we can separate the logic out:

There may be reasons why that doesn't work, or there may be better ways, possibly with database changes. However, due to the time constraints and the need to verify against the old PHP API anyway, I have chosen to re-implement the logic for now and revisit it once I am more familiar with the entire database / server flow. Hopefully I can at least make the code itself easier to parse.

PGijsbers commented 9 months ago

❓ How do we determine how to serialize empty values? E.g.,

            {"name": "cost_matrix", "cost_matrix": []},
            {"name": "evaluation_measures", "evaluation_measures": {"evaluation_measure": []}},

Why are these [] by default when they are missing? And not e.g., None.

PGijsbers commented 9 months ago

❓ What is the point of "hidden" task_type_inout entries? E.g., "custom_testset", "number_samples".

PGijsbers commented 8 months ago

File can be private/public, maybe should not be on the entity.

PGijsbers commented 8 months ago

❓ How do we determine how to serialize empty values? E.g.,

            {"name": "cost_matrix", "cost_matrix": []},
            {"name": "evaluation_measures", "evaluation_measures": {"evaluation_measure": []}},

Why are these [] by default when they are missing? And not e.g., None.

Those fields are forced to be present through the API. The other input types are optional so they are just omitted if not present.

PGijsbers commented 8 months ago

task_type_inout

Template search can be deleted.

Not sure what "hidden" requirements are for.