materialsproject / pymatgen

Python Materials Genomics (pymatgen) is a robust materials analysis code that defines classes for structures and molecules with support for many electronic structure codes. It powers the Materials Project.
https://pymatgen.org
Other
1.52k stars 867 forks source link

Preprocess Structure Reduction Before Bulk Match #4137

Closed lan496 closed 1 week ago

lan496 commented 3 weeks ago

Summary

The AflowPrototypeMatcher matches a structure against predefined AFLOW prototype structures using StructureMatcher. StructureMatcher requires preprocessing via lattice reduction and primitive-cell conversion. This commit optimizes performance by preprocessing AFLOW prototype structures during the initialization of AflowPrototypeMatcher. This change eliminates redundant processing when AflowPrototypeMatcher.get_prototypes is called multiple times.

Benchmark

I have checked the performance change by the following script, which calls AflowPrototypeMatcher with diamond-Si by 10 times.

# debug.py
from __future__ import annotations

from pymatgen.analysis.prototypes import AflowPrototypeMatcher
from pymatgen.util.testing import PymatgenTest

class TestAflowPrototypeMatcher(PymatgenTest):
    def test_prototype_matching(self):
        af = AflowPrototypeMatcher()

        struct = self.get_structure("Sn")
        for _ in range(10):
            prototype = af.get_prototypes(struct)[0]
            assert prototype["tags"] == {
                "aflow": "A_cF8_227_a",
                "mineral": "diamond",
                "pearson": "cF8",
                "strukturbericht": "A4",
            }

if __name__ == '__main__':
    TestAflowPrototypeMatcher().test_prototype_matching()

The current master branch takes 20.4 s in total. StructureMatcher._get_reduced_structure takes most of the computational time. image

This PR takes 2.48 s for the same workload. Now, the heavy primitive-cell conversion is called only once. image

These profiles are generated with cProfile and SnakeViz as follows

python -m cProfile -s cumulative -o debug.prof debug.py 

Checklist

Tip: Install pre-commit hooks to auto-check types and linting before every commit:

pip install -U pre-commit
pre-commit install
lan496 commented 2 weeks ago

Hi @mkhorton, This PR aims to enhance the performance of AflowPrototypeMatcher with a similar approach to StructureMatcher.group_structures (see https://github.com/materialsproject/pymatgen/pull/2490). It might be of interest to you. Could you please review this PR?

shyuep commented 1 week ago

Merged. Thanks,