Closed thisdotmatt closed 3 months ago
Hi @thisdotmatt thanks for the feedback, yeah this is something we're thinking about. I can definitely add a footnote to the website that provides a clarification on verified + open source.
Initially, we've been wanting to encourage submissions to the leaderboard. We're very happy with the activity, but it does potentially raise this question of whether the solutions are authentic.
So far, I'm fairly confident there hasn't been any instance of faked submissions. To back this up, here is a graphic showing (submissions X which instances they resolved). If there was a submission that simply ablated gold patches randomly to simulate resolved instances, I'd expect that row to look pretty scattered in terms of its resolves, but none of them seems to look like this.
https://x.com/jyangballin/status/1804212474121433375
We're currently thinking about whether there need to be any changes to the leaderboard submission criteria going forwards (e.g. if we get enough, separating the submissions by open source / verified). At this time, we're going to maintain our practice so far, but there may be some updates within the next 2 months.
Thanks for raising this issue, it's an important one. If/when there's updates, I'll be sure to follow up here.
Just added a couple notes to the website that provides some clarification about what "verified" and "open" mean.
Thanks John, I appreciate the transparency and the update to the website. The graph you shared is very interesting - I look forward to seeing it filled up as the field develops.
Hello,
I expect that many people, including myself, are beginning to look to the leaderboard website as a resource. I noticed that a number of the highest-scoring solutions on the leaderboard are unverified (and just as many closed source). While I am encouraged that corporations and research groups alike are interested in this field, I wonder if this lack of distinction could lead to misinformation.
I believe the leaderboards should prioritize verified solutions, either by rank or by separating the two groups entirely. In addition, it would be helpful to outline the distinction on the front page of the website.