materialsproject / matbench

Matbench: Benchmarks for materials science property prediction
https://matbench.materialsproject.org
MIT License
124 stars 47 forks source link

DeeperGATGNN results added #268

Closed sadmanomee closed 10 months ago

sadmanomee commented 1 year ago

Matbench Pull Request Template

Thanks for making a PR to Matbench! We appreciate your contribution (like, a lot). To make things run smoothly, check out the following templates, depending on what kind of PR you are making.

If you are making a benchmark submission (i.e., you have tried an algorithm on Matbench and want to appear on the leaderboard), please use the template under Benchmark submissions.

If you are making changes to the core matbench code, data, or docs, please use the template under Core code/data/docs changes.

Benchmark submissions

Matbench v0.1_DeeperGATGNN for the following tasks: matbench_dielectric, matbench_jdft2d, matbench_perovskites, matbench_phonons, matbench_log_kvrh, matbench_log_kvrh, matbench_mp_e_form, matbench_mp_gap.

Brief description of your algorithm

Scalable deeper graph neural networks for high-performance materials property prediction (https://www.cell.com/patterns/pdfExtended/S2666-3899(22)00076-9). We propose a scalable global graph attention neural network model DeeperGATGNN with differentiable group normalization (DGN) and skip connections for high-performance materials property prediction. Our model not only achieved state-of-the art results on benchmark dataset, but also is the most scalable one in terms of graph convolution layers, which allows us to train very deep networks (e.g., >30 layers) without significant performance degradation. Source code link: https://github.com/usccolumbia/deeperGATGNN

Included files

If you are making a benchmark submission, please only include the submission as a folder in the /benchmarks directory with the format <benchmark_name>_<algorithm_name>. Your PR should have no other changes to the core code. The submission should have these three required files, as indicated in the docs:

Example

-- benchmarks
---- matbench_v0.1_DeeperGATGNN
------ results.json.gz         # required filename (results)
------ info.json                  # required filename (info)
------ training.py               # main changes for matbech
------ main.py                   # minor changes for matbech
------ deep_gatgnn.py      # DeeperGATGNN architecture code
------ config.yml               # configuration for DeeperGATGNN

Please make sure each of these files has the information specified in the docs.

If you have other short/small files required for the notebook, please give a brief overview of what each one is used for and how to use it.

Label the pull request

Label the pull request with the new_benchmark label.

Core code/data/docs changes

Brief description of changes

Please include a brief description of the changes you are making, in bullet point format.

Tests

Indicate if your code requires new tests and whether they are included with your PR. ALL core code/data/docs changes adding new features must have new tests for them.

Closed issues or PRs

Indicate if your PR closes any currently open issues or supersedes any other currently open PRs.

Label the pull request

Label the pull request with the code or docs labels, depending on which one (or both) applies.

ardunn commented 1 year ago

Hi @sadmanomee ! Thanks for submitting!

I think the tests are failing because the data was modified after saving. This can happen when a results file is saved and then edited afterwards, since the object creates a unique hash based on its data when matbench writes it to file.

Do you have an unedited version where matbench has saved it directly to file?

sadmanomee commented 1 year ago

Hi Alex, thanks for the reply. I ran separate experiments for 8 datasets, so there were 8 different results.json.gz file. I later combined all of them and create a single results file. I can push all 8 separate results.json.gz file if needed. Please let me know.

ardunn commented 1 year ago

Hi Alex, thanks for the reply. I ran separate experiments for 8 datasets, so there were 8 different results.json.gz file. I later combined all of them and create a single results file. I can push all 8 separate results.json.gz file if needed. Please let me know.

Easiest thing to try to fix this first is, depending on how you have the files saved, to just merge them into one MatbenchBenchmark

If you have them saved using MatbenchTask.to_file:

  1. Load with MatbenchTask.from_file, x8 (one for each task)
  2. Create a new MatbenchBenchmark with subset equal to the names of the 8 tasks you have
  3. Try
from matbench.task import MatbenchTask
from matbench.bench import MatbenchBenchmark

my_tasks_loaded_from_file = {
    "matbench_steels": MatbenchTask.from_file("your_path_to_steels_result.json.gz"),
    "matbench_dielectric": MatbenchTask.from_file("your_path_to_dielectric_result.json.gz"),
    ...
}

mb = MatbenchBenchmark(subset=list(my_tasks_loaded_from_file.keys()))

# and if you have 
for task_name, task in my_tasks_loaded_from_file.items():
    mb.tasks_map[task_name] = task

mb.to_file("results.json.gz")

If you have them saved as 8 individual benchmarks, you can do the same thing but just take each task out of the benchmark before putting it into a new benchmark:

from matbench.task import MatbenchTask
from matbench.bench import MatbenchBenchmark

my_benchmarks_loaded_from_file = {
    "matbench_steels": MatbenchBenchmark.from_file("your_path_to_steels_result.json.gz"),
    "matbench_dielectric": MatbenchBenchmark.from_file("your_path_to_dielectric_result.json.gz"),
    ...
}

mb = MatbenchBenchmark(subset=list(my_benchmarks_loaded_from_file.keys()))

for task_name, benchmark in my_benchmarks_loaded_from_file.items():
    mb.tasks_map[task_name] = benchmark.tasks_map[task_name]

mb.to_file("results.json.gz")

Let me know if this works for you!

sadmanomee commented 1 year ago

Hi Alex, thanks for the reply again. I tried the method you said. It still gives a "bad gzip error".