Closed anana10c closed 6 months ago
Hi, thanks again for flagging this. Working on fixing the trials related bug so it returns the best trial.
For the summary_df, the time to best validation metric is just to give submitters some raw information (regardless of whether the target was achieved). I can add the columns you drafted to the summary if that is more useful information.
I see, I had assumed that the summary_df was intended to provide the "official" score for each trial. It would be great if you could just add a comment to mention that!
Ah ok I think I should probably rename the 'score' column to something more clear like submission time to target
I see, I had assumed that the summary_df was intended to provide the "official" score for each trial. It would be great if you could just add a comment to mention that!
+1 on this, it's extremely misleading rn!
Hello,
As we discussed in today's call, our understanding is that external tuning submission will be scored by taking the median across five studies, each of which takes the best time across five trials. Specifically, a study should always qualify as long as at least one of its trials meets the target.
However, it seems that the method
get_index_that_reaches_target
inscoring/performance_profile.py
performs an additional check to ensure that at least three of five trials in each study meet the target. See lines 147-151:This should be an easy fix - I think the lines can just be removed :)
We also noticed that
get_summary_df
inscoring/score_submissions.py
seems to take the time to the best validation accuracy (or relevant metric) rather than the validation target, though this function is not used in any of the computation for the performance profiles. Here's the fix I've been using (starting at line 50):Let me know if this seems correct!