Closed sadmanomee closed 10 months ago
Hi @sadmanomee ! Thanks for submitting!
I think the tests are failing because the data was modified after saving. This can happen when a results file is saved and then edited afterwards, since the object creates a unique hash based on its data when matbench writes it to file.
Do you have an unedited version where matbench has saved it directly to file?
Hi Alex, thanks for the reply. I ran separate experiments for 8 datasets, so there were 8 different results.json.gz file. I later combined all of them and create a single results file. I can push all 8 separate results.json.gz file if needed. Please let me know.
Hi Alex, thanks for the reply. I ran separate experiments for 8 datasets, so there were 8 different results.json.gz file. I later combined all of them and create a single results file. I can push all 8 separate results.json.gz file if needed. Please let me know.
Easiest thing to try to fix this first is, depending on how you have the files saved, to just merge them into one MatbenchBenchmark
If you have them saved using MatbenchTask.to_file:
subset
equal to the names of the 8 tasks you havefrom matbench.task import MatbenchTask
from matbench.bench import MatbenchBenchmark
my_tasks_loaded_from_file = {
"matbench_steels": MatbenchTask.from_file("your_path_to_steels_result.json.gz"),
"matbench_dielectric": MatbenchTask.from_file("your_path_to_dielectric_result.json.gz"),
...
}
mb = MatbenchBenchmark(subset=list(my_tasks_loaded_from_file.keys()))
# and if you have
for task_name, task in my_tasks_loaded_from_file.items():
mb.tasks_map[task_name] = task
mb.to_file("results.json.gz")
If you have them saved as 8 individual benchmarks, you can do the same thing but just take each task out of the benchmark before putting it into a new benchmark:
from matbench.task import MatbenchTask
from matbench.bench import MatbenchBenchmark
my_benchmarks_loaded_from_file = {
"matbench_steels": MatbenchBenchmark.from_file("your_path_to_steels_result.json.gz"),
"matbench_dielectric": MatbenchBenchmark.from_file("your_path_to_dielectric_result.json.gz"),
...
}
mb = MatbenchBenchmark(subset=list(my_benchmarks_loaded_from_file.keys()))
for task_name, benchmark in my_benchmarks_loaded_from_file.items():
mb.tasks_map[task_name] = benchmark.tasks_map[task_name]
mb.to_file("results.json.gz")
Let me know if this works for you!
Hi Alex, thanks for the reply again. I tried the method you said. It still gives a "bad gzip error".
Matbench Pull Request Template
Thanks for making a PR to Matbench! We appreciate your contribution (like, a lot). To make things run smoothly, check out the following templates, depending on what kind of PR you are making.
If you are making a benchmark submission (i.e., you have tried an algorithm on Matbench and want to appear on the leaderboard), please use the template under Benchmark submissions.
If you are making changes to the core matbench code, data, or docs, please use the template under Core code/data/docs changes.
Benchmark submissions
Matbench v0.1_DeeperGATGNN for the following tasks: matbench_dielectric, matbench_jdft2d, matbench_perovskites, matbench_phonons, matbench_log_kvrh, matbench_log_kvrh, matbench_mp_e_form, matbench_mp_gap.
Brief description of your algorithm
Scalable deeper graph neural networks for high-performance materials property prediction (https://www.cell.com/patterns/pdfExtended/S2666-3899(22)00076-9). We propose a scalable global graph attention neural network model DeeperGATGNN with differentiable group normalization (DGN) and skip connections for high-performance materials property prediction. Our model not only achieved state-of-the art results on benchmark dataset, but also is the most scalable one in terms of graph convolution layers, which allows us to train very deep networks (e.g., >30 layers) without significant performance degradation. Source code link: https://github.com/usccolumbia/deeperGATGNN
Included files
If you are making a benchmark submission, please only include the submission as a folder in the
/benchmarks
directory with the format<benchmark_name>_<algorithm_name>
. Your PR should have no other changes to the core code. The submission should have these three required files, as indicated in the docs:Example
Please make sure each of these files has the information specified in the docs.
If you have other short/small files required for the notebook, please give a brief overview of what each one is used for and how to use it.
Label the pull request
Label the pull request with the
new_benchmark
label.Core code/data/docs changes
Brief description of changes
Please include a brief description of the changes you are making, in bullet point format.
Tests
Indicate if your code requires new tests and whether they are included with your PR. ALL core code/data/docs changes adding new features must have new tests for them.
Closed issues or PRs
Indicate if your PR closes any currently open issues or supersedes any other currently open PRs.
Label the pull request
Label the pull request with the
code
ordocs
labels, depending on which one (or both) applies.