lmmentel / mendeleev

A python package for accessing various properties of elements, ions and isotopes in the periodic table of elements.
https://mendeleev.readthedocs.io
MIT License
208 stars 38 forks source link

Improvement in performance #135

Closed KylinGuo closed 3 months ago

KylinGuo commented 6 months ago

Dear Developers, Thanks for your very deep and brilliant work for this packages! Here I seek for your help/suggestions on my code to improve the performance. I defined Element class in oder to build features for compositions. However I find that my Element works slowly compared with pymatgen Element when dealing with a lot of compositions. The question is that how can I improve the performance. I guess that I am not on the right way. Here are my MWE. Any suggestion will be great appreciated!

from mendeleev import element as mendeleev_element

# mendeleev
element_properties_map_mendeleev = {
    'atomic_number': 'atomic_number',
    'atomic_radius': 'atomic_radius',
    'atomic_mass': 'atomic_weight',
    'period': 'period',
    'group': 'group_id',
    'number_valence': 'nvalence',
    'molar_volume': 'atomic_volume',
    'melting_temperature': 'melting_point',
    'electronegativity': 'electronegativity',
}

def print_dict(my_dict: dict):
    key_length = [len(key) for key in my_dict.keys()]
    max_width = max(key_length)
    text = "\n".join([f"{key.rjust(max_width)}: {value}" for key, value in my_dict.items()])
    print(text)

class Element(object):
    def __init__(self, element_str):
        element = mendeleev_element(element_str)  # This works slowly
        element_properties_map = element_properties_map_mendeleev
        for key, name2 in element_properties_map.items():
            is_value = getattr(element, name2)
            value = is_value() if callable(is_value) else is_value
            setattr(self, key, value)

        # Unit conversion
        key = 'atomic_radius'
        if hasattr(self, key):
            value = getattr(self, key)
            setattr(self, key, value / 100)

def main():
    element_properties_map = element_properties_map_mendeleev
    prop = element_properties_map.keys()
    elem_str = 'Ni'
    element = Element(elem_str)
    my_dict = {i: getattr(element, i) for i in prop}
    print_dict(my_dict)

if __name__ == "__main__":
    main()

Here is the sample output.

      atomic_number: 28
      atomic_radius: 1.35
        atomic_mass: 58.6934
             period: 4
              group: 10
     number_valence: 10
       molar_volume: 6.6
melting_temperature: 1728.15
  electronegativity: 1.91
lmmentel commented 3 months ago

Hey, thanks for reporting this. I'm not quite certain what you are trying to achieve here, would you mind explaining your use case? It seems that you are duplicating the mendeleev.models.Element class - maybe have a look at it and see it satisfies your use case.

A longer answer is that currently mendeleev suffers from a long import time (only first import) since it's fetching data for 118 elements and instantiating objects to allow for importing elements directly, i.e.

from mendeev import Fe, Ni
print(Fe.name)

There are a few ways of fetching data from mendeleev as explained in this section of the docs. In essence you have several options:

The order roughly corresponds to slowest to fastest with certain caveats but without more info on your use case it's hard to say anything more.

Does that help?

KylinGuo commented 3 months ago

Hey, thanks for reporting this. I'm not quite sure what you are trying to achieve here, would you mind explaining you use case? It seems that you are duplicating the mendeleev.models.Element class - maybe have a look at it and see it satisfies your use case.

A longer answer is that currently mendeleev suffers from a long import time (only first import) since it's fetching data for 118 elements and instantiating objects to allow for importing elements directly, i.e.

from mendeev import Fe, Ni
print(Fe.name)

There are a few ways of fetching data from mendeleev as explained in this section of the docs. In essence you have several options:

  • using mendeleev python api, with methods like elements or importing element object directly as in the example above
  • fetching entire tables from the db with pandas or sqlalchemy
  • querying the db from python with sqlalchemy or sqlite
  • getting data straight from elements.db sqlite database shipped with the package

The order roughly corresponds to slowest to fastest with certain caveats but without more info on your use case it's hard to say anything more.

Does that help?

Dear Immentel, Thank you for your assistance! I've come to recognize the improper usage of Mendeleev's 'element' in my class 'Element'. 'fetch_table' seems to be a more suitable approach for me. Once again, thank you. Allow me to provide some context on my previous work. I am in the process of featurizing a set of compositions. One potential solution involves utilizing composition-weighted elemental properties, which necessitates having the properties of each chemical element within a composition. Best regards, Qilin.

lmmentel commented 3 months ago

I'm happy it helped. Please note that while bulk of the element properties are stored and the db and available with fetch_table some properties are computed on the fly and therefore not stored. Examples of computed properties: electrophilicity, hardness, softness, etc.

I'm closing this issue, but feel free to open a new one if you have further questions.