materialsproject / pymatgen

Python Materials Genomics (pymatgen) is a robust materials analysis code that defines classes for structures and molecules with support for many electronic structure codes. It powers the Materials Project.
https://pymatgen.org
Other
1.43k stars 840 forks source link

The hkl index outputted by XRDCalculator is incorrect. #2826

Closed Higomon closed 1 year ago

Higomon commented 1 year ago

Describe the bug

An XRD expert, as pointed out on Twitter (https://bit.ly/3X4RFfa), the hkl index is obviously incorrect as shown in the attached image. We hope for improvement.

Screenshots

fig1

fig2

Environment:

shyuep commented 1 year ago

I am not able to reproduce this error. Here is the XRD pattern that I get from the latest pymatgen. As far as I can tell, the labelling of the peaks corresponds to the VESTA ones.

image

For future reference, I would prefer if you provide a structure file and code example to reproduce any reported errors. Since we don't have access to other third party apps, we can't reproduce what happens there.

Higomon commented 1 year ago

Thank you for your quick response.

The CIF file and Python code used are as follows.


CIF file

alpha-Fe2O3_fiz15840.zip

My Python code

from os.path import dirname, basename
import pandas as pd
import numpy as np
from scipy.special import wofz

from pymatgen.io.cif import CifParser
from pymatgen.analysis.diffraction.xrd import XRDCalculator

# setting --------------------------------
x_str = 10  # /deg (start of X range)
x_end = 120  # /deg (end of X range)

# Debye-Waller factor
DWF = {"Fe": 0.15, "O": 0.6}
# ----------------------------------------

def voigt(x, amp, pos, fwhm, shape):
    """
    x: x data
    amp: amplitude(peak height)
    pos: peak center
    fwhm: Full Width at Half Maximum
    shape:
    """
    tmp = 1 / wofz(np.zeros((len(x))) + 1j * np.sqrt(np.log(2.0)) * shape).real
    tmp = (
        tmp
        * amp
        * wofz(
            2 * np.sqrt(np.log(2.0)) * (x - pos) / fwhm
            + 1j * np.sqrt(np.log(2.0)) * shape
        ).real
    )
    return tmp

def sum_voigt_peaks(pat, x_str, x_end):
    x = np.arange(x_str, x_end, 0.01)
    calc_y = np.zeros(len(x))

    N = len(pat.x)
    fwhm = 0.06
    shape = 0.5
    for i in range(N):
        pos = pat.x[i]
        amp = pat.y[i]
        calc_y += voigt(x, amp, pos, fwhm, shape)

    return x, calc_y

def output_hklTable(pat, f):
    d_space = pat.d_hkls
    _hkl = pat.hkls

    hkl_index = [i[0]["hkl"] for i in _hkl]
    m = [i[0]["multiplicity"] for i in _hkl]
    dict = {
        "hkl": hkl_index,
        "multiplicity": m,
        "d": d_space,
        "2theta": pat.x,
        "Int": pat.y,
    }
    df = pd.DataFrame(dict)
    df.index = df.index + 1
    d_name = dirname(f)
    f_name = basename(f).replace(".cif", ".csv")
    df.to_csv(
        d_name + "\\hkl_" + f_name,
        encoding="utf-8",
        index=True,
    )

def output_XRDpattern(pat, f):
    x, y = sum_voigt_peaks(pat, x_str, x_end)
    dict = {"2theta": x, "Int": y}
    df = pd.DataFrame(dict)
    d_name = dirname(f)
    f_name = basename(f).replace(".cif", ".csv")
    df.to_csv(
        d_name + "\\prof_" + f_name,
        encoding="utf-8",
        index=False,
    )

def main():
    f = r"D:\alpha-Fe2O3_fiz15840.cif"
    parser = CifParser(f)
    structure = parser.get_structures()[0]
    lat = XRDCalculator(
        wavelength=1.54056,
        debye_waller_factors=DWF,
    )
    pat = lat.get_pattern(structure, scaled=True, two_theta_range=(x_str, x_end))

    output_hklTable(pat, f)

    output_XRDpattern(pat, f)

if __name__ == "__main__":
    main()
Higomon commented 1 year ago

I'd like to point out a clear mistake@pymatgen: 2022.0.17.

The below information was learned from an expert in crystallography. This mistake is a commonly seen even in well-known journals like JACS.


The term 'pat.hkls' in the code above corresponds to:

shyuep commented 1 year ago

@Higomon I cannot reproduce this error. The following code with the latest pymatgen:

from pymatgen.core import Structure
from pymatgen.analysis.diffraction.xrd import XRDCalculator
fe2o3 = Structure.from_file("gamma-Fe2O3_fiz15840.cif")
xrd = XRDCalculator()
pat = xrd.get_pattern(fe2o3)
for hkl, d_hkl in zip(pat.hkls, pat.d_hkls):
    print(f"{hkl[0]['hkl']}: {d_hkl}")

produces

(1, 0, -1, 2): 3.685515645479806
(1, 0, -1, 4): 2.702803854252989
(2, -1, -1, 0): 2.519
(0, 0, 0, 6): 2.2953333333333332
(2, -1, -1, 3): 2.2083785301414456
(2, 0, -2, 2): 2.0796505109689116
(2, 0, -2, 4): 1.842757822739903
(2, -1, -1, 6): 1.6966200256087358
(3, -1, -2, 1): 1.6373761063233816
(3, -1, -2, 2): 1.6037256011439684
(1, 0, -1, 8): 1.6013563379925253
(3, -1, -2, 4): 1.4872776183689425
(3, 0, -3, 0): 1.4543453280886673
(3, -1, -2, 5): 1.414875898650733
(2, 0, -2, 8): 1.3514019271264945
(1, 0, -1, 10): 1.3133259704931688
(2, -1, -1, 9): 1.3078237953427998
(3, -1, -2, 7): 1.2638308057577687
(4, -2, -2, 0): 1.2595
(3, 0, -3, 6): 1.2285052151599352
(4, -2, -2, 3): 1.2146149027377362
(4, -1, -3, 1): 1.2054441452139546
(4, -1, -3, 2): 1.191825624400629
(3, -1, -2, 8): 1.190852222477705
(2, 0, -2, 10): 1.164552105283258
(0, 0, 0, 12): 1.1476666666666666
(4, -1, -3, 4): 1.1416302174754414
(4, -1, -3, 5): 1.1078859219073478
(4, -2, -2, 6): 1.1041892650707228

This is exactly what it should be. The code you have is too complicated for this purpose.

For future reference, pls do not quote "an expert in crystallography". If the expert wishes to report the error, they should file an issue report with a detailed explanation as to why they think the output is wrong.

Higomon commented 1 year ago

Sure, with the code you presented, I can reproduce it the same way.

However, with the following code, it will be as I said.

Environment:

I tried to install the latest version through pip and conda, but it couldn't be installed due to errors.

My Code

from pymatgen.io.cif import CifParser
from pymatgen.analysis.diffraction.xrd import XRDCalculator

parser = CifParser("alpha-Fe2O3_fiz15840.cif") # <-- just changing the file name, it's the same file.
fe2o3 = parser.get_structures()[0]
xrd = XRDCalculator()
pat = xrd.get_pattern(fe2o3)
for hkl, d_hkl in zip(pat.hkls, pat.d_hkls):
    print(f"{hkl[0]['hkl']}: {d_hkl}")

print

(1, 1, 0): 3.6855156454798053
(1, 1, 2): 2.7028038542529886
(2, 1, 1): 2.5190000000000006
(0, 0, 2): 2.295333333333333
(2, 1, 2): 2.208378530141446
(2, 2, 2): 2.079650510968912
(2, 2, 0): 1.8427578227399026
(2, 1, 3): 1.6966200256087358
(3, 2, 2): 1.6373761063233818
(3, 2, 1): 1.6037256011439682
(1, 0, 3): 1.6013563379925249
(3, 2, 3): 1.4872776183689427
(3, 3, 2): 1.4543453280886673
(3, 1, 3): 1.414875898650733
(2, 2, 4): 1.3514019271264943
(1, 1, 4): 1.3133259704931686
(2, 1, 4): 1.3078237953427996
(3, 2, 4): 1.2638308057577687
(4, 2, 2): 1.2595000000000003
(3, 3, 4): 1.2285052151599352
(4, 2, 3): 1.2146149027377364
(4, 3, 2): 1.2054441452139546
(4, 3, 3): 1.191825624400629
(3, 1, 4): 1.1908522224777047
(2, 0, 4): 1.1645521052832577
(0, 0, 4): 1.1476666666666664
(4, 3, 1): 1.1416302174754414
(4, 3, 4): 1.1078859219073478
(4, 2, 4): 1.104189265070723
shyuep commented 1 year ago

Your python version is probably old. Pls upgrade your python version to at least 3.9 or 3.10 and reinstall pymatgen. 2022.0.x is too long ago and no longer supported.

Higomon commented 1 year ago

Even with the latest pymatgen, the incorrect index is still being produced.

In the reliable ICDD, the index is indicated by three elements of hkl. Therefore, if possible, it is desired that pymatgen also align with it.

If either the trigonal or the rhombohedral system can represent the crystal, it is desired to indicate the hkl index of the trigonal system.

Environment:

My Code

import sys
sys.path.append('/home/Higomen/py_envs/py39/venv/lib/python3.9/site-packages')
from pymatgen.io.cif import CifParser
from pymatgen.analysis.diffraction.xrd import XRDCalculator

parser = CifParser("alpha-Fe2O3_fiz15840.cif")
fe2o3 = parser.get_structures()[0]
xrd = XRDCalculator()
pat = xrd.get_pattern(fe2o3)
for hkl, d_hkl in zip(pat.hkls, pat.d_hkls):
    print(f"{hkl[0]['hkl']}: {d_hkl}")

print

(1, 1, 0): 3.685515645479806
(1, 1, 2): 2.7028038542529895
(2, 1, 1): 2.5190000000000006
(0, 0, 2): 2.295333333333333
(2, 1, 2): 2.208378530141446
(2, 0, 0): 2.079650510968912
(2, 2, 0): 1.842757822739903
(2, 1, 3): 1.6966200256087358
(3, 2, 2): 1.6373761063233818
(3, 2, 1): 1.6037256011439687
(1, 0, 3): 1.6013563379925249
(3, 2, 3): 1.4872776183689427
(3, 0, 1): 1.4543453280886676
(3, 1, 3): 1.4148758986507333
(2, 2, 4): 1.3514019271264948
(1, 1, 4): 1.3133259704931683
(2, 1, 4): 1.3078237953427998
(3, 2, 4): 1.263830805757769
(4, 2, 2): 1.2595000000000003
(3, 3, 4): 1.2285052151599354
(4, 2, 3): 1.2146149027377364
(4, 1, 2): 1.2054441452139548
(4, 3, 3): 1.191825624400629
(3, 1, 4): 1.190852222477705
(2, 0, 4): 1.1645521052832577
(0, 0, 4): 1.1476666666666664
(4, 3, 1): 1.1416302174754416
(4, 3, 4): 1.1078859219073478
(4, 2, 4): 1.104189265070723
shyuep commented 1 year ago

Pymatgen calculates the XRD pattern based on the structure provided. If you want the tribunal structure, you have to provide it with that structure and set the primitive =False option. This has nothing to do with the XRDCalculator itself.

Higomon commented 1 year ago

This bug is serious and should be made publicly available.

Despite its existence, I still view pymatgen as an excellent open-source software.


Reference

ICDD 00-033-0664_alpha Fe2O3(Hematite)

Environment:

pymatgen.core.structure module

primitive option True | from_file()

from pymatgen.core import Structure
from pymatgen.analysis.diffraction.xrd import XRDCalculator

fe2o3 = Structure.from_file("alpha-Fe2O3_fiz15840.cif", primitive=True)
xrd = XRDCalculator()
pat = xrd.get_pattern(fe2o3)
for hkl, d_hkl in zip(pat.hkls, pat.d_hkls):
    print(f"{hkl[0]['hkl']}: {d_hkl}")

print

(1, 1, 0): 3.685515645479806
(1, 1, 2): 2.7028038542529895
(2, 1, 1): 2.5190000000000006
(0, 0, 2): 2.295333333333333
(2, 1, 2): 2.208378530141446
(2, 0, 0): 2.079650510968912
(2, 2, 0): 1.842757822739903
(2, 1, 3): 1.6966200256087358
(3, 2, 2): 1.6373761063233818
(3, 2, 1): 1.6037256011439687
(1, 0, 3): 1.6013563379925249
(3, 2, 3): 1.4872776183689427
(3, 0, 1): 1.4543453280886676
(3, 1, 3): 1.4148758986507333
(2, 2, 4): 1.3514019271264948
(1, 1, 4): 1.3133259704931683
(2, 1, 4): 1.3078237953427998
(3, 2, 4): 1.263830805757769
(4, 2, 2): 1.2595000000000003
(3, 3, 4): 1.2285052151599354
(4, 2, 3): 1.2146149027377364
(4, 1, 2): 1.2054441452139548
(4, 3, 3): 1.191825624400629
(3, 1, 4): 1.190852222477705
(2, 0, 4): 1.1645521052832577
(0, 0, 4): 1.1476666666666664
(4, 3, 1): 1.1416302174754416
(4, 3, 4): 1.1078859219073478
(4, 2, 4): 1.104189265070723

primitive option False | from_file()

from pymatgen.core import Structure
from pymatgen.analysis.diffraction.xrd import XRDCalculator

fe2o3 = Structure.from_file("alpha-Fe2O3_fiz15840.cif", primitive=False)
xrd = XRDCalculator()
pat = xrd.get_pattern(fe2o3)
for hkl, d_hkl in zip(pat.hkls, pat.d_hkls):
    print(f"{hkl[0]['hkl']}: {d_hkl}")

print

(1, 0, -1, 2): 3.685515645479806
(1, 0, -1, 4): 2.7028038542529895
(2, -1, -1, 0): 2.5190000000000006
(0, 0, 0, 6): 2.2953333333333332
(2, -1, -1, 3): 2.208378530141446
(2, 0, -2, 2): 2.079650510968912
(2, 0, -2, 4): 1.842757822739903
(2, -1, -1, 6): 1.6966200256087358
(3, -1, -2, 1): 1.6373761063233816
(3, -1, -2, 2): 1.6037256011439684
(1, 0, -1, 8): 1.6013563379925253
(3, -1, -2, 4): 1.4872776183689427
(3, 0, -3, 0): 1.4543453280886676
(3, -1, -2, 5): 1.4148758986507333
(2, 0, -2, 8): 1.3514019271264948
(1, 0, -1, 10): 1.3133259704931688
(2, -1, -1, 9): 1.3078237953427998
(3, -1, -2, 7): 1.263830805757769
(4, -2, -2, 0): 1.2595000000000003
(3, 0, -3, 6): 1.2285052151599354
(4, -2, -2, 3): 1.2146149027377364
(4, -1, -3, 1): 1.205444145213955
(4, -1, -3, 2): 1.1918256244006291
(3, -1, -2, 8): 1.190852222477705
(2, 0, -2, 10): 1.164552105283258
(0, 0, 0, 12): 1.1476666666666666
(4, -1, -3, 4): 1.1416302174754416
(4, -1, -3, 5): 1.1078859219073478
(4, -2, -2, 6): 1.104189265070723
shyuep commented 1 year ago

I am not sure what you are referring to. The primitive = False output is the same as the reference output. Eg (102) and (012) are equivalent in the hexagonal setting. There is no bug here.

Higomon commented 1 year ago

VESTA imports the same CIF file and outputs hkl, which is shown in the topmost image. The output of VESTA is, of course, the same as ICDD.

Why is pymatgen different?

shyuep commented 1 year ago

Pls refer to a crystallography textbook on equivalent miller indices. For example, (100), (010) and (001) are all equivalent in a cubic crystal. There is no specific convention.

Higomon commented 1 year ago

In the widely recognized crystallography text by Professor Cullity, the index on top of X-ray diffraction peaks do NOT have brackets (refer to Cullity's book or page 366 at the URL).

It is important to differentiate between the X-ray diffraction refractive index (hkl index) and the mirror index.

Ref.

B. D. Cullity, Elements of X-ray diffraction, (1980). http://s1.iran-mavad.com/pdf96/X-RAY%20DIFFRACTION%20Cullity_iran-mavad.com.pdf

shyuep commented 1 year ago

Feel free to remove the brackets if you wish. I am not replying to the is thread further given that the bug that is claimed doesn't exist.

Higomon commented 1 year ago

It is important to differentiate between the X-ray diffraction refractive index (hkl index) and the mirror index.

This is not a claim, but simply an observation of an error made as a chemist who loves science.