materialsproject / matbench

Matbench: Benchmarks for materials science property prediction
https://matbench.materialsproject.org
MIT License
125 stars 46 forks source link

Lattice xgboost, a baseline using lattice parameters, space group number, and unit cell volume #152

Closed sgbaird closed 2 years ago

sgbaird commented 2 years ago

eXtreme Gradient Boosting trees (XGBoost) is applied on basic tabular data describing the crystal lattice of each compound: lattice parameter lengths and angles, space group number, and unit cell volume. Fixed XGBoost hyperparameters were used. This serves as part of a baseline to answer the question: how much predictive performance is present in the basic details of a crystal lattice (i.e. no composition, no site information)?

This is designed for use on the matbench_mp_e_form task as an alternative perspective in a more established domain (i.e. model accuracy) to that of generative model benchmarking. This is specifically part of a series of baselines and tests related to the xtal2png representation.

https://github.com/sparks-baird/xtal2png/issues/51

Authored primarily by @cseeg

sgbaird commented 2 years ago

@ardunn tests are failing, seems related to matminer. e.g.

======================================================================
ERROR: test_has_polymorphs (matbench.tests.test_task.TestMatbenchTask)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/matbench/matbench/matbench/tests/test_task.py", line 464, in test_has_polymorphs
    mbt = MatbenchTask("matbench_steels", autoload=True)
  File "/home/runner/work/matbench/matbench/matbench/task.py", line 89, in __init__
    self.df = load(self.dataset_name) if autoload else None
  File "/home/runner/work/matbench/matbench/matbench/data_ops.py", line 66, in load
    df = load_dataset(dataset_name)
  File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/matminer/datasets/dataset_retrieval.py", line 66, in load_dataset
    _validate_dataset(
  File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/matminer/datasets/utils.py", line 89, in _validate_dataset
    raise UserWarning(
UserWarning: Error, hash of downloaded file does not match that included in metadata, the data may be corrupt or altered
======================================================================
ERROR: test_instantiation (matbench.tests.test_task.TestMatbenchTask)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/matbench/matbench/matbench/tests/test_task.py", line 35, in test_instantiation
    MatbenchTask(ds, autoload=True)
  File "/home/runner/work/matbench/matbench/matbench/task.py", line 89, in __init__
    self.df = load(self.dataset_name) if autoload else None
  File "/home/runner/work/matbench/matbench/matbench/data_ops.py", line 66, in load
    df = load_dataset(dataset_name)
  File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/matminer/datasets/dataset_retrieval.py", line 66, in load_dataset
    _validate_dataset(
  File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/matminer/datasets/utils.py", line 89, in _validate_dataset
    raise UserWarning(
UserWarning: Error, hash of downloaded file does not match that included in metadata, the data may be corrupt or altered
======================================================================
ERROR: test_record (matbench.tests.test_task.TestMatbenchTask)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/matbench/matbench/matbench/tests/test_task.py", line 211, in test_record
    mbt.load()
  File "/home/runner/work/matbench/matbench/matbench/task.py", line 235, in load
    self.df = load(self.dataset_name)
  File "/home/runner/work/matbench/matbench/matbench/data_ops.py", line 66, in load
    df = load_dataset(dataset_name)
  File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/matminer/datasets/dataset_retrieval.py", line 66, in load_dataset
    _validate_dataset(
  File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/matminer/datasets/utils.py", line 89, in _validate_dataset
    raise UserWarning(
UserWarning: Error, hash of downloaded file does not match that included in metadata, the data may be corrupt or altered
----------------------------------------------------------------------
Ran 30 tests in 73.[767](https://github.com/materialsproject/matbench/runs/6874143276?check_suite_focus=true#step:4:768)s
ardunn commented 2 years ago

@sgbaird Thanks for the PR! Let me see if I can fix this and I'll merge this in.

sgbaird commented 2 years ago

Thanks!

ardunn commented 2 years ago

Merged! Not sure what was going on with the tests, maybe some sort of version issue. I was able to pass all the tests of your branch on my machine so it's probably just some CI problem which I'll debug.

cseeg commented 2 years ago

Sweet