materialsproject / matbench

Matbench: Benchmarks for materials science property prediction
https://matbench.materialsproject.org
MIT License
108 stars 46 forks source link

Can the Matbench submission process be handled entirely within a Google Colab notebook? #196

Open sgbaird opened 1 year ago

sgbaird commented 1 year ago

Not necessarily meaning let's change how the Matbench submission system works (I like the thoroughness/provenance), but rather if Google Colab has what's necessary to programatically follow the three submission steps:

  1. "Create 3 required files"
  2. "Put files in appropriate folder"
  3. "Create a PR to the Matbench repository"
sgbaird commented 1 year ago

Save notebook programmatically to harddisk:

Write info.json file to temporary Colab storage:

Put this in a directory structure using os or similar

Make pull request via API or CLI:

Short answer seems to be yes.

This could reduce the barrier to getting an initial "win" (e.g. verifying it works for a dummy model), and also opens up a direct opportunity for low-barrier even with bigger models if someone uses Colab Pro or Pro+. This would also make it easier for someone to mock-up a notebook with some dummy data and then ask someone with a Colab Pro account to click run. Anyway, just some thoughts based on some recent internal discussion with a colleague about Matbench.

sgbaird commented 1 year ago

The tough part is programmatically downloading a snapshot of the current notebook with outputs. This is even harder if this requires the Colab notebook to be the active window on the user's machine.

In the end, I think the user will have to "Save a copy to Drive" after clicking an "Open in Colab" link. From there, it should be possible to programmatically extract the file ID and download the (hopefully recently autosaved) notebook. Probably good to include save an extra notebook with the input history.

Some of the relevant code:

from IPython.display import display, Javascript
display(Javascript('IPython.notebook.save_checkpoint();')) # save, but probably only if window is active
%notebook -e notebook.ipynb

based on some modifications I'm doing for xtal2png + imagen-pytorch:

import time
from IPython.display import display, Javascript
display(Javascript('IPython.notebook.save_checkpoint();')) # save, but probably only if window is active
timestr = time.strftime("%Y%m%d-%H%M%S")
notebook_savepath = path.join(results_folder, f"notebook.ipynb")
print(notebook_savepath) # no output cells
%notebook -e notebook-input-history.ipynb
!mv notebook-input-history.ipynb $notebook_savepath

WIP at https://colab.research.google.com/drive/15YLOWHB_NkIIqKLO0ik784fsK2xJD08l?usp=sharing

Adding an "Open in Colab" badge could be accomplished ad-hoc in the Matbench actions. Still, maybe it's better to request that the user downloads the notebook and manually adds a version with outputs, markdown cells, and the "open in Colab" badge (which the Colab UI makes very straightforward).

Haven't fleshed out the details of creating the GitHub PR. Not sure if authentication will cause more problems than just following normal instructions.

once it's ready, then planning to share notebook with @jae3goals https://github.com/materialsproject/matbench/pull/141