materialsproject / pymatgen

Python Materials Genomics (pymatgen) is a robust materials analysis code that defines classes for structures and molecules with support for many electronic structure codes. It powers the Materials Project.
https://pymatgen.org
Other
1.52k stars 867 forks source link

return `bool` instead of `np.bool_` #4074

Closed jmmshn closed 1 month ago

jmmshn commented 1 month ago

Summary

The return signature of this function is currently np.bool_ which can cause serialization problems with older atomate1 workflows.

Changed to bool instead.

DanielYang59 commented 1 month ago

Beautiful! I believe I have seen issues related to this at some point.

I just did a global search (re pattern return np\..* [><=]) and there are a few more cases that perhaps you could fix as well? Thanks!

https://github.com/materialsproject/pymatgen/blob/ea7c3390e2db99a35f3eee242073ffc2830bbd4c/src/pymatgen/analysis/chemenv/utils/coordination_geometry_utils.py#L716-L727

https://github.com/materialsproject/pymatgen/blob/ea7c3390e2db99a35f3eee242073ffc2830bbd4c/src/pymatgen/core/surface.py#L308-L321

The following might return [True] or [False] as a single element array if I understand correctly?

https://github.com/materialsproject/pymatgen/blob/ea7c3390e2db99a35f3eee242073ffc2830bbd4c/src/pymatgen/analysis/interfaces/zsl.py#L317-L330

jmmshn commented 1 month ago

I did some messy inspect voodoo and got this list:

Function '_normalization_factor' in 'pymatgen.analysis.phase_diagram' returned a NumPy generic object
Function 'matches' in 'pymatgen.core.structure' returned a NumPy generic object
Function 'get' in 'pymatgen.core.composition' returned a NumPy generic object
Function 'get_n_moment' in 'pymatgen.electronic_structure.dos' returned a NumPy generic object
Function 'is_symmetric' in 'pymatgen.core.tensors' returned a NumPy generic object
Function 'get_coeff' in 'pymatgen.analysis.reaction_calculator' returned a NumPy generic object
Function 'r2_score' in 'pymatgen.phonon.dos' returned a NumPy generic object
Function 'mae' in 'pymatgen.phonon.dos' returned a NumPy generic object
Function 'best_case' in 'pymatgen.analysis.ewald' returned a NumPy generic object
Function '__getitem__' in 'pymatgen.core.structure' returned a NumPy generic object
Function 'get_d' in 'pymatgen.core.surface' returned a NumPy generic object
Function 'get_specific_energy' in 'pymatgen.apps.battery.conversion_battery' returned a NumPy generic object
Function '__getitem__' in 'pymatgen.core.composition' returned a NumPy generic object
Function 'get_interpolated_value' in 'pymatgen.core.spectrum' returned a NumPy generic object
Function 'get_linear_interpolated_value' in 'pymatgen.util.coord' returned a NumPy generic object
Function 'get_band_width' in 'pymatgen.electronic_structure.dos' returned a NumPy generic object
Function 'get_average_voltage' in 'pymatgen.apps.battery.conversion_battery' returned a NumPy generic object
Function 'cv' in 'pymatgen.phonon.dos' returned a NumPy generic object
Function 'project' in 'pymatgen.core.tensors' returned a NumPy generic object
Function 'get_el_amount' in 'pymatgen.analysis.reaction_calculator' returned a NumPy generic object
Function 'is_polar' in 'pymatgen.core.surface' returned a NumPy generic object
Function 'get_e_above_hull' in 'pymatgen.analysis.phase_diagram' returned a NumPy generic object
Function 'einsum_sequence' in 'pymatgen.core.tensors' returned a NumPy generic object
Function 'get_band_skewness' in 'pymatgen.electronic_structure.dos' returned a NumPy generic object
Function 'get_capacity_vol' in 'pymatgen.apps.battery.conversion_battery' returned a NumPy generic object
Function 'helmholtz_free_energy' in 'pymatgen.phonon.dos' returned a NumPy generic object
Function '_get_item_index' in 'pymatgen.core.tensors' returned a NumPy generic object
Function 'get_band_kurtosis' in 'pymatgen.electronic_structure.dos' returned a NumPy generic object
Function 'd_hkl' in 'pymatgen.core.lattice' returned a NumPy generic object
Function 'distance' in 'pymatgen.core.sites' returned a NumPy generic object
Function 'value_at' in 'pymatgen.io.vasp.outputs' returned a NumPy generic object
Function 'get_reference_energy' in 'pymatgen.analysis.phase_diagram' returned a NumPy generic object
Function 'internal_energy' in 'pymatgen.phonon.dos' returned a NumPy generic object
Function 'get_fermi_interextrapolated' in 'pymatgen.electronic_structure.dos' returned a NumPy generic object
Function '__array_wrap__' in 'pymatgen.core.tensors' returned a NumPy generic object
Function 'get_gap' in 'pymatgen.electronic_structure.dos' returned a NumPy generic object
Function '__abs__' in 'pymatgen.electronic_structure.core' returned a NumPy generic object
Function 'get_fermi' in 'pymatgen.electronic_structure.dos' returned a NumPy generic object
Function 'average_over_unit_sphere' in 'pymatgen.core.tensors' returned a NumPy generic object
Function 'get_band_filling' in 'pymatgen.electronic_structure.dos' returned a NumPy generic object
Function 'entropy' in 'pymatgen.phonon.dos' returned a NumPy generic object
Function 'width' in 'pymatgen.phonon.bandstructure' returned a NumPy generic object
Function 'get_last_peak' in 'pymatgen.phonon.dos' returned a NumPy generic object
Function 'get_doping' in 'pymatgen.electronic_structure.dos' returned a NumPy generic object
Function 'get_equilibrium_reaction_energy' in 'pymatgen.analysis.phase_diagram' returned a NumPy generic object
Function 'selling_dist' in 'pymatgen.core.lattice' returned a NumPy generic object
Function 'get_distance' in 'pymatgen.core.structure' returned a NumPy generic object
Function 'debye_temp_phonopy' in 'pymatgen.phonon.gruneisen' returned a NumPy generic object
Function 'in_simplex' in 'pymatgen.util.coord' returned a NumPy generic object
Function 'get_site_energy' in 'pymatgen.analysis.ewald' returned a NumPy generic object
Function 'get_hull_energy_per_atom' in 'pymatgen.analysis.phase_diagram' returned a NumPy generic object
Function 'is_fit_to_structure' in 'pymatgen.core.tensors' returned a NumPy generic object
Function 'get_interpolated_value' in 'pymatgen.phonon.dos' returned a NumPy generic object
Function 'compute_partial_energy' in 'pymatgen.analysis.ewald' returned a NumPy generic object
Function 'get_energy_density' in 'pymatgen.apps.battery.conversion_battery' returned a NumPy generic object
Function 'zero_point_energy' in 'pymatgen.phonon.dos' returned a NumPy generic object
Function 'is_rotation' in 'pymatgen.core.tensors' returned a NumPy generic object
Function 'get_direct_band_gap' in 'pymatgen.electronic_structure.bandstructure' returned a NumPy generic object
Function 'thermal_conductivity_slack' in 'pymatgen.phonon.gruneisen' returned a NumPy generic object
Function 'get_hull_energy' in 'pymatgen.analysis.phase_diagram' returned a NumPy generic object
Function 'get_form_energy' in 'pymatgen.analysis.phase_diagram' returned a NumPy generic object
Function 'get_phase_separation_energy' in 'pymatgen.analysis.phase_diagram' returned a NumPy generic object
Function 'has_imaginary_freq' in 'pymatgen.phonon.bandstructure' returned a NumPy generic object
Function 'calculate_energy' in 'pymatgen.analysis.reaction_calculator' returned a NumPy generic object
Function 'average_gruneisen' in 'pymatgen.phonon.gruneisen' returned a NumPy generic object
Function 'get_capacity_grav' in 'pymatgen.apps.battery.conversion_battery' returned a NumPy generic object
Function 'fit' in 'pymatgen.analysis.structure_matcher' returned a NumPy generic object
Function 'get_band_center' in 'pymatgen.electronic_structure.dos' returned a NumPy generic object
Function '__getitem__' in 'pymatgen.electronic_structure.core' returned a NumPy generic object
Function '__lt__' in 'pymatgen.electronic_structure.core' returned a NumPy generic object
Function '__array_wrap__' in 'pymatgen.analysis.elasticity.strain' returned a NumPy generic object
Function 'get_upper_band_edge' in 'pymatgen.electronic_structure.dos' returned a NumPy generic object
Function 'get_atomic_fraction' in 'pymatgen.core.composition' returned a NumPy generic object
Function 'formula_double_format' in 'pymatgen.util.string' returned a NumPy generic object

The @njit wrapper might have messed with the inspect voodoo. I'll patch these next week.

Slightly updated list:

Function 'best_case' in 'pymatgen.analysis.ewald.EwaldMinimizer' returned a NumPy generic object
Function 'get_coeff' in 'pymatgen.analysis.reaction_calculator.BalancedReaction' returned a NumPy generic object
Function 'get_n_moment' in 'pymatgen.electronic_structure.dos.CompleteDos' returned a NumPy generic object
Function 'get_upper_band_edge' in 'pymatgen.electronic_structure.dos.CompleteDos' returned a NumPy generic object
Function 'get_capacity_grav' in 'pymatgen.apps.battery.conversion_battery.ConversionElectrode' returned a NumPy generic object
Function 'is_rotation' in 'pymatgen.core.tensors.SquareTensor' returned a NumPy generic object
Function 'get_energy_density' in 'pymatgen.apps.battery.conversion_battery.ConversionElectrode' returned a NumPy generic object
Function 'formula_double_format' in 'pymatgen.util.string' returned a NumPy generic object
Function 'get_site_energy' in 'pymatgen.analysis.ewald.EwaldSummation' returned a NumPy generic object
Function 'get_e_above_hull' in 'pymatgen.analysis.phase_diagram.PhaseDiagram' returned a NumPy generic object
Function 'has_imaginary_freq' in 'pymatgen.phonon.bandstructure.PhononBandStructureSymmLine' returned a NumPy generic object
Function 'thermal_conductivity_slack' in 'pymatgen.phonon.gruneisen.GruneisenParameter' returned a NumPy generic object
Function 'get_gap' in 'pymatgen.electronic_structure.dos.Dos' returned a NumPy generic object
Function 'get_band_center' in 'pymatgen.electronic_structure.dos.CompleteDos' returned a NumPy generic object
Function 'entropy' in 'pymatgen.phonon.dos.CompletePhononDos' returned a NumPy generic object
Function '_normalization_factor' in 'pymatgen.analysis.phase_diagram.PDEntry' returned a NumPy generic object
Function 'calculate_energy' in 'pymatgen.analysis.reaction_calculator.ComputedReaction' returned a NumPy generic object
Function '__array_wrap__' in 'pymatgen.core.tensors.SquareTensor' returned a NumPy generic object
Function 'get_band_filling' in 'pymatgen.electronic_structure.dos.CompleteDos' returned a NumPy generic object
Function 'get_interpolated_value' in 'pymatgen.phonon.dos.PhononDos' returned a NumPy generic object
Function 'get_band_skewness' in 'pymatgen.electronic_structure.dos.CompleteDos' returned a NumPy generic object
Function 'cv' in 'pymatgen.phonon.dos.PhononDos' returned a NumPy generic object
Function 'is_symmetric' in 'pymatgen.core.tensors.SquareTensor' returned a NumPy generic object
Function 'get_gap' in 'pymatgen.electronic_structure.dos.DOS' returned a NumPy generic object
Function 'get_gap' in 'pymatgen.electronic_structure.dos.FermiDos' returned a NumPy generic object
Function 'width' in 'pymatgen.phonon.bandstructure.PhononBandStructureSymmLine' returned a NumPy generic object
Function 'selling_dist' in 'pymatgen.core.lattice.Lattice' returned a NumPy generic object
Function 'cv' in 'pymatgen.phonon.dos.CompletePhononDos' returned a NumPy generic object
Function '__getitem__' in 'pymatgen.core.composition.Composition' returned a NumPy generic object
Function 'get_fermi' in 'pymatgen.electronic_structure.dos.FermiDos' returned a NumPy generic object
Function '__array_wrap__' in 'pymatgen.analysis.elasticity.strain.Deformation' returned a NumPy generic object
Function 'get_capacity_vol' in 'pymatgen.apps.battery.conversion_battery.ConversionElectrode' returned a NumPy generic object
Function 'einsum_sequence' in 'pymatgen.core.tensors.Tensor' returned a NumPy generic object
Function 'get_hull_energy' in 'pymatgen.analysis.phase_diagram.PhaseDiagram' returned a NumPy generic object
Function 'get_interpolated_value' in 'pymatgen.core.spectrum.Spectrum' returned a NumPy generic object
Function 'get_atomic_fraction' in 'pymatgen.core.composition.Composition' returned a NumPy generic object
Function 'compute_partial_energy' in 'pymatgen.analysis.ewald.EwaldSummation' returned a NumPy generic object
Function 'get_average_voltage' in 'pymatgen.apps.battery.conversion_battery.ConversionElectrode' returned a NumPy generic object
Function 'average_over_unit_sphere' in 'pymatgen.core.tensors.Tensor' returned a NumPy generic object
Function 'in_simplex' in 'pymatgen.util.coord.Simplex' returned a NumPy generic object
Function 'get_equilibrium_reaction_energy' in 'pymatgen.analysis.phase_diagram.PhaseDiagram' returned a NumPy generic object
Function 'matches' in 'pymatgen.core.structure.Structure' returned a NumPy generic object
Function 'get_el_amount' in 'pymatgen.analysis.reaction_calculator.BalancedReaction' returned a NumPy generic object
Function '__getitem__' in 'pymatgen.electronic_structure.core.Magmom' returned a NumPy generic object
Function 'helmholtz_free_energy' in 'pymatgen.phonon.dos.CompletePhononDos' returned a NumPy generic object
Function 'get_phase_separation_energy' in 'pymatgen.analysis.phase_diagram.PhaseDiagram' returned a NumPy generic object
Function 'is_polar' in 'pymatgen.core.surface.Slab' returned a NumPy generic object
Function 'd_hkl' in 'pymatgen.core.lattice.Lattice' returned a NumPy generic object
Function 'internal_energy' in 'pymatgen.phonon.dos.PhononDos' returned a NumPy generic object
Function '__abs__' in 'pymatgen.electronic_structure.core.Magmom' returned a NumPy generic object
Function 'get_direct_band_gap' in 'pymatgen.electronic_structure.bandstructure.LobsterBandStructureSymmLine' returned a NumPy generic object
Function 'get_band_width' in 'pymatgen.electronic_structure.dos.CompleteDos' returned a NumPy generic object
Function 'helmholtz_free_energy' in 'pymatgen.phonon.dos.PhononDos' returned a NumPy generic object
Function '__array_wrap__' in 'pymatgen.core.tensors.Tensor' returned a NumPy generic object
Function 'get_reference_energy' in 'pymatgen.analysis.phase_diagram.PhaseDiagram' returned a NumPy generic object
Function 'get_last_peak' in 'pymatgen.phonon.dos.PhononDos' returned a NumPy generic object
Function 'get_band_kurtosis' in 'pymatgen.electronic_structure.dos.CompleteDos' returned a NumPy generic object
Function 'zero_point_energy' in 'pymatgen.phonon.dos.PhononDos' returned a NumPy generic object
Function 'r2_score' in 'pymatgen.phonon.dos.PhononDos' returned a NumPy generic object
Function 'debye_temp_phonopy' in 'pymatgen.phonon.gruneisen.GruneisenParameter' returned a NumPy generic object
Function '__lt__' in 'pymatgen.electronic_structure.core.Magmom' returned a NumPy generic object
Function 'fit' in 'pymatgen.analysis.structure_matcher.StructureMatcher' returned a NumPy generic object
Function 'mae' in 'pymatgen.phonon.dos.PhononDos' returned a NumPy generic object
Function 'get_linear_interpolated_value' in 'pymatgen.util.coord' returned a NumPy generic object
Function 'internal_energy' in 'pymatgen.phonon.dos.CompletePhononDos' returned a NumPy generic object
Function 'get_doping' in 'pymatgen.electronic_structure.dos.FermiDos' returned a NumPy generic object
Function 'average_gruneisen' in 'pymatgen.phonon.gruneisen.GruneisenParameter' returned a NumPy generic object
Function 'get_specific_energy' in 'pymatgen.apps.battery.conversion_battery.ConversionElectrode' returned a NumPy generic object
Function 'get_d' in 'pymatgen.core.surface' returned a NumPy generic object
Function 'get_distance' in 'pymatgen.core.structure.Structure' returned a NumPy generic object
Function 'get_direct_band_gap' in 'pymatgen.electronic_structure.bandstructure.BandStructureSymmLine' returned a NumPy generic object
Function 'is_symmetric' in 'pymatgen.core.tensors.Tensor' returned a NumPy generic object
Function 'value_at' in 'pymatgen.io.vasp.outputs.VolumetricData' returned a NumPy generic object
Function 'distance' in 'pymatgen.core.sites.PeriodicSite' returned a NumPy generic object
Function 'get_gap' in 'pymatgen.electronic_structure.dos.CompleteDos' returned a NumPy generic object
Function 'get_hull_energy_per_atom' in 'pymatgen.analysis.phase_diagram.PhaseDiagram' returned a NumPy generic object
Function 'get' in 'pymatgen.core.composition.Composition' returned a NumPy generic object
Function 'project' in 'pymatgen.core.tensors.Tensor' returned a NumPy generic object
Function 'get_fermi_interextrapolated' in 'pymatgen.electronic_structure.dos.FermiDos' returned a NumPy generic object
Function 'is_fit_to_structure' in 'pymatgen.core.tensors.Tensor' returned a NumPy generic object
Function 'calculate_energy' in 'pymatgen.analysis.reaction_calculator.BalancedReaction' returned a NumPy generic object
Function '__getitem__' in 'pymatgen.core.structure.PeriodicNeighbor' returned a NumPy generic object
Function 'get_form_energy' in 'pymatgen.analysis.phase_diagram.PhaseDiagram' returned a NumPy generic object
Function '_get_item_index' in 'pymatgen.core.tensors.TensorMapping' returned a NumPy generic object
Function 'entropy' in 'pymatgen.phonon.dos.PhononDos' returned a NumPy generic object
jmmshn commented 1 month ago

I think this script can be adapted later to handle a more thorough check of serialization bugs caused by custom as_dicts.

import pytest
import inspect
import pkgutil
import importlib
import pymatgen
import numpy as np

triggered = set()

def log_numpy_return(func, path_name, name):
    def wrapper(*args_, **kwargs_):
        result = func(*args_, **kwargs_)
        if isinstance(result, np.generic):
            # print(f"Function '{name}' in '{path_name}' returned a NumPy generic object")
            triggered.add(f"Function '{name}' in '{path_name}' returned a NumPy generic object")
        return result
    return wrapper

def wrap_func(path, name, func, path_name):
    setattr(path, name, log_numpy_return(func, path_name, name))

def get_all_functions(package):
    """
    Get all functions and methods from a package and its submodules.
    """
    functions = dict()
    for module_info in pkgutil.walk_packages(package.__path__, prefix=f"{package.__name__}."):
        try:
            # Dynamically import the module
            module = importlib.import_module(module_info.name)
            # Get all functions in the module
            for name, func_ in inspect.getmembers(module, inspect.isfunction):
                source_module = func_.__module__
                if package.__name__ not in source_module: continue
                functions[f"{module.__name__}.{name}"] = (module, name, func_, source_module)
            # Get all the methods in the module
            for class_name, obj in inspect.getmembers(module, inspect.isclass):
                source_module = obj.__module__
                if package.__name__ not in source_module: continue
                for name2, method_ in inspect.getmembers(obj, inspect.isfunction):
                    functions[f"{module.__name__}.{class_name}.{name2}"] = (obj, name2, method_, f"{source_module}.{class_name}")
        except Exception as e:
            # Handle modules that cannot be imported or have issues
            print(f"Failed to inspect {module_info.name}: {e}")
    return functions

funcs = get_all_functions(pymatgen)
for v in funcs.values():
    wrap_func(*v)

# Run pytest
pytest.main(['-v', '-s'])
print("Triggered:")
print(triggered)
# write triggerd to file
with open("triggered.txt", "w") as f:
    f.write("\n".join(triggered))
jmmshn commented 1 month ago

Ok I went through the list and found ones where a user will plausibly serialize into a DB. For most of them the return type was already annotated with bool so this is just something that the linters should have caught https://github.com/python/mypy/issues/10385

DanielYang59 commented 1 month ago

Wow, that's way more thorough than mine :)

For most of them the return type was already annotated with bool so this is just something that the linters should have caught

I thought mypy should be able to catch such mismatch but for some reason it didn't. Do you have any idea?

import numpy as np

def foo() -> bool:
    return np.bool_(True)  >>> Incompatible return value type (got "numpy.bool", expected "builtins.bool")  [return-value]

I'm not seeing return-value being suppressed or something: https://github.com/materialsproject/pymatgen/blob/ea7c3390e2db99a35f3eee242073ffc2830bbd4c/pyproject.toml#L282-L287

jmmshn commented 1 month ago

Yeah mypy should work for this. So not sure why this these got ignored.

mkhorton commented 1 month ago

Will go ahead and merge this, thanks both