txie-93 / cgcnn

Crystal graph convolutional neural networks for predicting material properties.
MIT License
628 stars 304 forks source link

The problem of using the Materials Project database and the Perovskite database #31

Open yuyouyu32 opened 2 years ago

yuyouyu32 commented 2 years ago

Hello~ Thanks for your great work! I am a computer science student, so I am not familiar with the use of material databases. After I read your data.py code, I find that you read cif file and get anything you need. But as a computer science student, I can't understand your method of getting information from cif file. I would be very grateful if you could give me a little explanation. Besides, your cif files are from COD database, how could I get cif files from the Materials Project database and the Perovskite database in the same format as COD database.

yuyouyu32 commented 2 years ago

In addition, I use pymatgen.ext.matproj import MPRester API to get information from the Materials Project database, but I can not get cif files through this way. As for the Perovskite database, ase.db.connect('cubic_perovskites.db'), I don't know how can I get the same information as `cif files from COD database. I would be very grateful if you could answer my questions!!!

txie-93 commented 2 years ago

@yuyouyu32 The definition of cif files can be found in https://en.wikipedia.org/wiki/Crystallographic_Information_File

This function from pymatgen can convert the cif string to a Structure object: https://pymatgen.org/pymatgen.core.structure.html?highlight=structure#pymatgen.core.structure.IStructure.from_str From the object you can get the coordinates and atom types as numpy arrays.

You should be able to find similar functions in the ASE documentation to read the Perovskite database: https://wiki.fysik.dtu.dk/ase/

Hope that this is helpful.

yuyouyu32 commented 2 years ago

Thank you very much for your detailed answer. I have a few more questions, sorry I need to bother you.

To be direct, I can't find the code in your data.py that reads The Materials Project database and The Perovskite database. But I can find your function which is used for reading cif files. So I wonder that you read cif files from The Materials Project database and The Perovskite database. But I don't know how to download cif with pymatgen.ext.matproj import MPRester. And I don't know how to get cif files from cubic_perovskites.db file with ase.

After reading your answer, I realized that maybe you are converting from string type data directly to cgcnn's input format. I wonder if I can get your example code for reading data which could be directly used in cgcnn's training from the two databases.(The Materials Project database and The Perovskite database)