yoshida-lab / XenonPy

XenonPy is a Python Software for Materials Informatics
http://xenonpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
131 stars 57 forks source link

Compounds not found in Material Project #224

Open alejagd opened 3 years ago

alejagd commented 3 years ago

Hi, I'm trying to extract some features of some compounds that are not on Materials Project, and I get an error. On the xenonpy tutorial, you mention that it is possible to use your own data, I was wondering If I could extract structure and composition features of a compound that is not on material project, using, for example a cif file.

stewu5 commented 3 years ago

To my understanding, you should be able to do so. Can you show us the error message to help us pin point the potential problem?

TsumiNa commented 3 years ago

@alejagd Hi! Please give us some sample code and error messages to understand what you did; then, we can catch up on the errors.

alejagd commented 3 years ago

Thanks for the soon answer : ). For example, I want to extract the features of this compound Gd0.9Sc0.1Ni2. It doest appear in MP, so, I dont know the mp_id, but I figured out how to adapt your code ("sample_data_building.ipynb") to look for "pretty formula". Using the integer formula Gd9ScNi20 and adapting "sample_data_building.ipynb", I get this error :

File "XX.py", line 84, in df = data_fetcher(api_key, a)
File "XX.py", line 58, in data_fetcher df = df.drop('material_id', axis=1) File "xenonpy/lib/python3.7/site-packages/pandas/core/frame.py", line 4312, in drop errors=errors, File "xenonpy/lib/python3.7/site-packages/pandas/core/generic.py", line 4150, in drop obj = obj._drop_axis(labels, axis, level=level, errors=errors) File "xenonpy/lib/python3.7/site-packages/pandas/core/generic.py", line 4185, in _drop_axis newaxis = axis.drop(labels, errors=errors) File "xenonpy/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 5591, in drop raise KeyError(f"{labels[mask]} not found in axis") KeyError: "['material_id'] not found in axis"_

When I tested with other compounds that are on MP, my code worked.

stewu5 commented 3 years ago

@alejagd Based on the error message, I suspect that you are implementing the code in a way that the class is still looking for material_id column as the reference column for information used to perform descriptor conversion. Are you aware of this? For more detail recommendation, I will pass it to our expert on this module @TsumiNa .

alejagd commented 3 years ago

Thanks @stewu5 Well, I didnt change the name of the variable "material_id", but Im sure the program is saving the chemical formula in that variable and looking in Materials Project "the pretty formula" accordingly. I tested with other compound and It worked. My problem is how to obtain features If the compound is not on Material Project. May I integrate Xenonpy with Crystallographic Open Database, for example?.

monozukuri-ism commented 3 years ago

@alejagd XenonPy descriptor calculation classes (specifically the compositional and structural descriptor classes) can be used to calculate any materials with the desired input format (e.g., structural descriptor need pymatgen.Structure object and compositional descriptor need composition as python dict).

The "sample_data_building.ipynb" is a specific example showing how to automatically extract the desired input format type from the Materials Project Database using their IDs. The output saved in pandas.DataFrame object will be used in other samples. If you have a new data source online, you would need to write your own code, gather data, and convert it to the input type required by the specific XenonPy descriptor class. For example, you can download Gd0.9Sc0.1Ni2 data from COD and convert the composition as {'Gd': 0.9, 'Sc': 0.1, 'Ni': 2} for xenonpy.descriptor.Compositions using. See https://github.com/yoshida-lab/XenonPy/blob/master/samples/calculate_descriptors.ipynb for details of how to calculate descriptors in XenonPy.

If you plan to connect XenonPy with some open databases, we would like to welcome you to be one of our contributors and share your codes. That way, we may be able to provide more specific support based on the code you write.

alejagd commented 3 years ago

@monozukuri-ism thank you for your advice, It works!!. I was wondering If compositional features are calculated using just Machine learning algorithms or did you use other computations as DFT?. I would like to contribute in a future but for now I need to learn a little bit more python :).

stewu5 commented 3 years ago

@alejagd I would like to refer you to this page for details of our compositional features calculation: https://xenonpy.readthedocs.io/en/latest/features.html#compositional-descriptors

We are considering to include more sophisticated descriptors, but they are still under development. The key point for us is always to make large screening of unseen materials possible. Hence, we want to make sure the calculations are always generally applicable, yet can be done within a reasonable time frame (hopefully without the need of super computer).

We welcome any comment or advice all the time! Thank you for your interest.