materialsproject / matbench

Matbench: Benchmarks for materials science property prediction
https://matbench.materialsproject.org
MIT License
101 stars 43 forks source link

Potential for a kaggle competition #195

Open sgbaird opened 1 year ago

sgbaird commented 1 year ago

I think a major issue with getting more participation on Matbench is that people perform their own splits/work in isolation from Matbench and that performant models tend to be trained on most recent snapshots and comprehensive data (i.e. when the model is going into real-world use). This can make it difficult to persuade people to spend time learning Matbench, even though it is very easy to use, and setting up potentially large compute time for expensive models.

There are two approaches to addressing this.

One is reducing the barrier such as accepting disparate benchmarks, writing up the benchmark notebooks for people upon request, and running the benchmarks for them. The first waters down the benchmark, and the latter two put a lot of burden on the Matbench developers.

A second approach involves increasing the incentive. One way to do this is via a kaggle competition using Matbench 2.0 with property predictions, adaptive design, and generative modeling and offering prizes. This involves upfront work in designing and hosting the competition, but it also distributes the work across the community and incentivizes use of the best models by people, even if they weren't the original authors. Authorship can also be offered for participants with top-scoring models, assuming no disqualification.

We could base it on/learn from the NOMAD 2018 kaggle competition: https://www.nature.com/articles/s41524-019-0239-3.

Prize funding/prizes would need to also be sourced. Maybe materials informatics companies, acceleration consortium, Apple, Meta, etc. would be willing to sponsor.

sgbaird commented 1 year ago

Another example: https://accelerationconsortium.substack.com/p/hackathon

sgbaird commented 1 year ago

Something that came to mind for a hackathon with adaptive design tasks is to have two winning categories:

  1. participants that develop a model that uses the fewest number of iterations to reach a predetermined optimum
  2. participants that use the fewest cumulative number of objective function calls during the development of the model

The latter would of course require that access to the underlying objective function has gate-keeping and monitoring.