materialsproject / pymatgen

Python Materials Genomics (pymatgen) is a robust materials analysis code that defines classes for structures and molecules with support for many electronic structure codes. It powers the Materials Project.
https://pymatgen.org
Other
1.51k stars 864 forks source link

[Dev] Tests for `Bader` not running properly #3652

Closed DanielYang59 closed 8 months ago

DanielYang59 commented 8 months ago

Summary

Noticed two Bader related issues:

Test workflow not running properly

While trying to clean up test files for VASP, I noticed tests for Bader in (tests/command_line/test_bader_caller.py) (apparently depends on VASP output files) in the GitHub workflow doesn't seem to run properly (though the tests seem to pass, until one checks the details):

Run wget http://theory.cm.utexas.edu/henkelman/code/bader/download/bader_lnx_64.tar.gz
--2024-02-23 16:46:06--  http://theory.cm.utexas.edu/henkelman/code/bader/download/bader_lnx_64.tar.gz
Resolving theory.cm.utexas.edu (theory.cm.utexas.edu)... 146.6.145.114
Connecting to theory.cm.utexas.edu (theory.cm.utexas.edu)|146.6.145.114|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2024-02-23 16:46:06 ERROR 403: Forbidden.

Error: Process completed with exit code 8.

Seen the screenshot: test_fail

Test for Bader fails

If running on a local machine with Bader properly installed, I then get:

__________________ TestBaderAnalysis.test_atom_parsing ___________________

self = <tests.command_line.test_bader_caller.TestBaderAnalysis testMethod=test_atom_parsing>

    def test_atom_parsing(self):
        # test with reference file
        analysis = BaderAnalysis(
            chgcar_filename=f"{TEST_FILES_DIR}/CHGCAR.Fe3O4",
            potcar_filename=f"{TEST_FILES_DIR}/POTCAR.Fe3O4",
            chgref_filename=f"{TEST_FILES_DIR}/CHGCAR.Fe3O4_ref",
            parse_atomic_densities=True,
        )

        assert len(analysis.atomic_densities) == len(analysis.chgcar.structure)

>       assert np.sum(analysis.chgcar.data["total"]) == approx(
            np.sum([dct["data"] for dct in analysis.atomic_densities])
        )

/home/yang/Developer/pymatgen/tests/command_line/test_bader_caller.py:129: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/home/yang/Developer/pymatgen/venv/lib/python3.10/site-packages/numpy/core/fromnumeric.py:2313: in sum
    return _wrapreduction(a, np.add, 'sum', axis, dtype, out, keepdims=keepdims,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

obj = [array([[[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
...   [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]]]), ...]
ufunc = <ufunc 'add'>, method = 'sum', axis = None, dtype = None
out = None
kwargs = {'initial': <no value>, 'keepdims': <no value>, 'where': <no value>}
passkwargs = {}

    def _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs):
        passkwargs = {k: v for k, v in kwargs.items()
                      if v is not np._NoValue}

        if type(obj) is not mu.ndarray:
            try:
                reduction = getattr(obj, method)
            except AttributeError:
                pass
            else:
                # This branch is needed for reductions like any which don't
                # support a dtype.
                if dtype is not None:
                    return reduction(axis=axis, dtype=dtype, out=out, **passkwargs)
                else:
                    return reduction(axis=axis, out=out, **passkwargs)

>       return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
E       ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (14,) + inhomogeneous part.

/home/yang/Developer/pymatgen/venv/lib/python3.10/site-packages/numpy/core/fromnumeric.py:88: ValueError
janosh commented 8 months ago

thanks for reporting! the whole bader is very brittle atm. i was hoping we could transition to pybader at some point in the future to have less install and subprocess shenanigans but development there seems to have stalled.

if you could fix the current implementation, that would be great! 👍

DanielYang59 commented 8 months ago

thanks for reporting! the whole bader is very brittle atm. i was hoping we could transition to pybader at some point in the future to have less install and subprocess shenanigans but development there seems to have stalled.

Yes it seems pybader is stalled for some reason...

if you could fix the current implementation, that would be great!

Well I have made too many promises (in pymatviz and pymatgen, and my own codebase) but having too limited time 😭. I would certain give this a try when I have time but I cannot promise that at this moment sorry.

janosh commented 8 months ago

Well I have made too many promises (in pymatviz and pymatgen, and my own codebase) but having too limited time 😭.

i know the problem 😄

don't worry, nobody's going to hold you to those promises. any help is appreciated and none is expected

DanielYang59 commented 8 months ago

I just had a look at why test for Bader_caller is failing.

It looks like BaderAnalysis sliced a central part from the complete charge density: https://github.com/materialsproject/pymatgen/blob/dc60d35a21fd7b55d82c65bb078afc4a2c0e9bbf/pymatgen/command_line/bader_caller.py#L194-L224

As the "encompassing volume" is determined on the fly: https://github.com/materialsproject/pymatgen/blob/dc60d35a21fd7b55d82c65bb078afc4a2c0e9bbf/pymatgen/command_line/bader_caller.py#L209-L216

The shapes for charge density array of different atoms differ, and therefore the error in test from trying to calculating the sum of arrays in different shapes: https://github.com/materialsproject/pymatgen/blob/dc60d35a21fd7b55d82c65bb078afc4a2c0e9bbf/tests/command_line/test_bader_caller.py#L129-L131

So the fix should be pretty straightforward (zero-pad charge density arrays to consistent shape).

But I'm not sure about the underlying reasoning behind this @janosh @shyuep , as this seems to be a breaking change so I guess should avoid this (if I pad the charge array inplace in BaderAnalysis), or should I keep this behaviour and just pad the arrays in tests?

Thanks a lot for any suggestion 😃 .

janosh commented 8 months ago

So the fix should be pretty straightforward (zero-pad charge density arrays to consistent shape).

But I'm not sure about the underlying reasoning behind this @janosh @shyuep ,

this is actually news to me. i hadn't looked closely at the implementation before. if the tests produce arrays of different shapes then i don't understand how they could have passed in the past? maybe @Andrew-S-Rosen knows more?

potential for breakage from zero padding seems small but i'm still leaning towards just padding the arrays in tests to fix CI rather than changing the implementation

DanielYang59 commented 8 months ago

Thanks for the quick response and for sharing your thoughts.

if the tests produce arrays of different shapes then i don't understand how they could have passed in the past?

Not sure... I just checked an early (on Dec 6, 2023, expired after 90 days) workflow logs and it seems at least back then the tests passed just fine (pytest split 3) 😕 :

6.70s call     tests/command_line/test_bader_caller.py::TestBaderAnalysis::test_atom_parsing

Maybe I didn't understand the implementation correctly?

potential for breakage from zero padding seems small but i'm still leaning towards just padding the arrays in tests to fix CI rather than changing the implementation

If we change the implementation to zero-pad the atomic_densities, then the behaviour of this function would change to produce consistent shapes for atomic_densities (which would break downstream process). So I guess it's better not to change this for now. Or we could make zero-pad optional during __init__ if necessary.

Andrew-S-Rosen commented 8 months ago

I have never used parse_atomic_densities = True so can't help much here (the default is False). I must be honest --- I'm not entirely sure of the purpose of this task anyway. The bader analysis is already breaking down the charge density on a per-atom basis. Why would I want pymatgen to "Enable atomic partition of the charge density" separately from this in a more rudimentary way?

janosh commented 8 months ago

that's funny, i was wondering the same thing but figured i'm missing something obvious.

So maybe the right path forward is to put a deprecation warning on this functionality with a removal date and see if anyone objects before then?

Andrew-S-Rosen commented 8 months ago

So maybe the right path forward is to put a deprecation warning on this functionality with a removal date and see if anyone objects before then?

I would lean towards yes (with a somewhat generation deprecation period), unless it's used in the MP builders for some reason. CC @munrojm

janosh commented 8 months ago

sure, let's do 1 year deprecation. emmet only uses bader_analysis_from_path(dir_name, suffix=suffix) (i.e. parse_atomic_densities=False).

@DanielYang59 would you like to submit a PR to close this issue?

DanielYang59 commented 8 months ago

Why would I want pymatgen to "Enable atomic partition of the charge density" separately from this in a more rudimentary way?

I was wondering too (thought there is some reasoning I didn't know about...), thanks a lot for pointing out.

sure, let's do 1 year deprecation. emmet only uses bader_analysis_from_path(dir_name, suffix=suffix) (i.e. parse_atomic_densities=False).

@DanielYang59 would you like to submit a PR to close this issue?

Glad to help, thanks for the helpful discussion.

DanielYang59 commented 8 months ago

Fixed in #3656