Closed eu9ene closed 2 months ago
We also discussed that it might be possible to generate this table as a report using the source evaluation data from the runs. Then we wouldn't need to duplicate metric publishing. We should explore this option too.
I played a bit with the report feature in W&B, it seems designed to build a "snapshot" from the web interface (for clients or collaborators). Export formats are PDF or LaTeX (or directly published to W&B).
I don't think this will fit our needs, but I still recommend using the tables as the main source of data.
I found no way to do this directly from the W&B UI, maybe it is worth asking them directly ? In my opinion, the easiest way to handle this would be to create a Python script which list and aggregates all those tables for a specified project (and allows filtering by group & run name/id), using the API.
Related to #575. We discussed that it should be easy to populate the group_logs evals table row by row from different evaluation tasks.