materialsproject / pymatgen

Python Materials Genomics (pymatgen) is a robust materials analysis code that defines classes for structures and molecules with support for many electronic structure codes. It powers the Materials Project.
https://pymatgen.org
Other
1.48k stars 855 forks source link

EnumerateStructureTransformation does not return structures with correct stoichiometries #1286

Closed ardunn closed 6 years ago

ardunn commented 6 years ago

Describe the bug Using EnumerateStructureTransformation with the max_disordered_sites parameter specified does not return structures with correct stoichometry. For example, with the attached cif file for Cu2.8Te2 the output structures should be Cu7Te5; however, the output structures are Cu3Te2.

To Reproduce

from pymatgen import Structure
from pymatgen.transformations.advanced_transformations import EnumerateStructureTransformation
from pymatgen.alchemy.transmuters import StandardTransmuter

s = Structure.from_file("cu7te5.cif")
print("Original Structure")
print(s)

orderer = EnumerateStructureTransformation(max_cell_size=None, max_disordered_sites=100)
stm = StandardTransmuter.from_structures([s], [orderer], extend_collection=10)
structures = stm.transformed_structures
for s in structures:
    print(s.final_structure)

The original structure is:

Original Structure
Full Formula (Cu2.8 Te2)
Reduced Formula: Cu2.8Te2
abc   :   3.980000   3.980000   6.120000
angles:  90.000000  90.000000  90.000000
Sites (6)
  #  SP           a     b      c
---  --------  ----  ----  -----
  0  Cu        0.75  0.25  0
  1  Cu        0.25  0.75  0
  2  Cu:0.400  0.25  0.25  0.27
  3  Cu:0.400  0.75  0.75  0.73
  4  Te        0.25  0.25  0.715
  5  Te        0.75  0.75  0.285

The output structures are:

Full Formula (Cu3 Te2)
Reduced Formula: Cu3Te2
abc   :   3.980000   3.980000   6.120000
angles:  90.000000  90.000000  90.000000
Sites (5)
  #  SP       a     b      c
---  ----  ----  ----  -----
  0  Cu    0.75  0.25  1
  1  Cu    0.25  0.75  0
  2  Cu    0.25  0.25  0.27
  3  Te    0.25  0.25  0.715
  4  Te    0.75  0.75  0.285

Full Formula (Cu6 Te4)
Reduced Formula: Cu3Te2
abc   :   5.628570   5.628570   7.300329
angles:  67.325145  67.325145  90.000000
Sites (10)
  #  SP         a       b      c
---  ----  ------  ------  -----
  0  Cu    0.25    0       0
  1  Cu    0.75    0.5     1
  2  Cu    0.75    0       0
  3  Cu    0.25    0.5     1
  4  Cu    0.865   0.615   0.27
  5  Cu    0.635   0.885   0.73
  6  Te    0.1425  0.8925  0.715
  7  Te    0.6425  0.3925  0.715
  8  Te    0.3575  0.6075  0.285
  9  Te    0.8575  0.1075  0.285
...

Expected behavior The final structures should be Cu7Te5, not Cu3Te2. Note that using EnumerateStructureTransformation(max_cell_size=5) returns correct stoichiometries but the code above does not.

Desktop (please complete the following information):

Additional context The cu7te5.cif file is:

##############################################################################
#                                                                            #
# Cu-Te            # Cu1.4Te ht1                                   # 2040845 #
#                                                                            #
##############################################################################
#                                                                            #
#                           Pearson's Crystal Data                           #
#      Crystal Structure Database for Inorganic Compounds (on DVD)           #
#                              Release 2017/18                               #
#                  Editors: Pierre Villars and Karin Cenzual                 #
#                                                                            #
#   Copyright (c) ASM International & Material Phases Data System (MPDS),    #
# Switzerland & National Institute for Materials Science (NIMS), Japan, 2017 #
#                   All rights reserved. Version 2017.08                     #
#                                                                            #
#   This copy of Pearson's Crystal Data is licensed to:                      #
#   University of Alberta, Chemistry Department, 1-5 Installations License         #
#                                                                            #
##############################################################################

data_2040845
_audit_creation_date                     2018-09-07
_audit_creation_method
;
Pearson's Crystal Data browser
;
#_database_code_PCD                      2040845

# Entry summary

_chemical_formula_structural             'Cu~1.4~ Te'
_chemical_formula_sum                    'Cu1.40 Te'
_chemical_name_mineral                   rickardite
_chemical_compound_source                synthetic
_chemical_name_structure_type            Cu~2~Sb,tP6,129
_chemical_formula_weight                 216.6

# Bibliographic data

_publ_section_title
'Crystal structure of rickardite, Cu~4-x~Te~2~'                               
_journal_coden_ASTM                      AMMIAY
_journal_name_full                       'Am. Mineral.'
_journal_year                            1949
_journal_volume                          34
_journal_page_first                      441
_journal_page_last                       451
_journal_language                        English
loop_
 _publ_author_name
 _publ_author_address
'Forman S.A.'
;
Toronto University
Toronto
Canada
;
'Peacock M.A.'
;
Toronto University
Toronto
Canada
;

# Standardized crystallographic data

_cell_length_a                           3.98
_cell_length_b                           3.98
_cell_length_c                           6.12
_cell_angle_alpha                        90
_cell_angle_beta                         90
_cell_angle_gamma                        90
_cell_volume                             96.94
_cell_formula_units_Z                    2
_space_group_IT_number                   129
_space_group_name_H-M_alt                'P 4/n m m (origin choice 2)'
loop_
 _space_group_symop_id
 _space_group_symop_operation_xyz
 1 'x, y, z'
 2 '1/2-x, 1/2-y, z'
 3 '1/2-x, y, z'
 4 '-x, -y, -z'
 5 '-x, 1/2+y, -z'
 6 '1/2-y, 1/2-x, z'
 7 '1/2-y, x, z'
 8 '-y, -x, -z'
 9 '-y, 1/2+x, -z'
 10 '1/2+x, -y, -z'
 11 '1/2+x, 1/2+y, -z'
 12 'x, 1/2-y, z'
 13 '1/2+y, -x, -z'
 14 '1/2+y, 1/2+x, -z'
 15 'y, 1/2-x, z'
 16 'y, x, z'
loop_
 _atom_type_symbol
 Cu
 Te
loop_
 _atom_site_label
 _atom_site_type_symbol
 _atom_site_symmetry_multiplicity
 _atom_site_Wyckoff_symbol
 _atom_site_fract_x
 _atom_site_fract_y
 _atom_site_fract_z
 _atom_site_occupancy
 Cu2 Cu 2 c 0.25 0.25 0.27 0.4
 Te Te 2 c 0.25 0.25 0.715 1
 Cu1 Cu 2 a 0.75 0.25 0 1

_exptl_crystal_colour                    'purple red'
_exptl_crystal_density_meas              ?
_exptl_crystal_density_diffrn            7.42
_cell_measurement_temperature            ?
_cell_measurement_radiation              'X-rays, Cu Ka1'
_cell_measurement_wavelength             1.5405
_pd_proc_wavelength                      1.5405
_cell_measurement_reflns_used            ?
_diffrn_ambient_temperature              ?
_diffrn_measurement_device               film
_diffrn_measurement_device_type          ?
_diffrn_radiation_type                   'X-rays, Cu Ka1'
_diffrn_radiation_wavelength             1.5405
_diffrn_reflns_number                    ?
_exptl_absorpt_coefficient_mu            ?
_exptl_absorpt_correction_type           ?
_computing_structure_solution            'starting values from the literature'
_refine_ls_number_parameters             ?
_refine_ls_number_reflns                 ?
_refine_ls_R_factor_gt                   ?
_refine_ls_wR_factor_gt                  ?
_pd_proc_ls_proof_R_factor               ?
_pd_proc_ls_proof_wR_factor              ?
_refine_ls_R_I_factor                    ?

# End of data set 2040845
ardunn commented 6 years ago

For reference, the topic on the pymatgen help group

mkhorton commented 6 years ago

I'm a bit confused by this too. It seems clear the first ordering sets one Cu:0.4 site to 0, and the other Cu:0.4 site to 1, which is not completely unreasonable but obviously the overall composition is wrong.

I see this is @computron's code/Anubhav has already commented on the Google Groups thread, do you have any ideas where the rounding from 0.4->0.5 occupancy is happening before I take a closer look at this? Seems like this must be happening in EnumlibAdaptor.

computron commented 6 years ago

The max_disordered_sites option is supposed to be a way to help you find a good max_cell_size parameter. Rather than a single max_cell_size, it keeps incrementing max_cell_size (starting small and getting higher) until EnumlibAdapter returns a solution. If EnumlibAdapter finds any solution, the iteration will quit. Thus, it tries to find the minimum max_cell_size that produces some solution from EnumlibAdapter, thus saving large computation time from having max_cell_size cranked up too high.

For this structure, you need a cell size of at least 5 in order to get the occupancies to work out. However, EnumlibAdapter already starts returning structures (with the wrong composition, Cu3Te2) at max_cell_size=2. e.g. running the code with EnumerateStructureTransformation(max_cell_size=2) will produce the same results.

Clearly, the assumption that the point at which EnumlibAdapter produces any solution is a good "stopping" point for the max_cell_size parameter is incorrect. Actually, I have no idea why EnumlibAdapter doesn't produce any solution for max_cell_size=1 but does so for max_cell_size=2.

Anyone with more idea of the inner workings of when EnumlibAdapter does and doesn't return a solution could probably help in designing a better solution to this problem ...

computron commented 6 years ago

Note also that the intention of max_disordered_sites was to help make EnumerateStructureTransformation a bit more automatic.

That said, I think this needs a rework. Maybe a separate wrapper for the automatic function. Also the automatic function can:

shyuep commented 6 years ago

I think the problem is in enumlib itself. I did some basic debugging and the structure supplied to enumlib is not changing between loops. So somehow, enumlib is assuming that a 0.4 occupancy can be satisified by a Cu3Te2.

mkhorton commented 6 years ago

We could apply some sanity checks on the output / enforce a 'strict' mode to filter these incorrect composition structs out.

Agreed the ultimate issue is enumlib. It's too much of a black box for my liking.

Would also be happy to adopt the LCM approach suggested, I've tried something similar previously, possibly in MagOrdering, would have to check. I know we have DiscretizeOccupanciesTransformation now too which would take care of part of that approach.

shyuep commented 6 years ago

We could apply sanity checks, but I think let's confirm that the problem is indeed in enumlib, and not in some weird issue in pymatgen's generation of enumlib input files. If there is indeed a problem with enumlib, we should post an issue on enumlib itself and let Gus Hart know. On our end, we should probably implement a tolerance check that the compositions returned matches the intended ones to within a certain tolerance.

shyuep commented 6 years ago

I found the issue. The problem in this case is that the base (which is multipled by conc to generate composition ranges) is too small a number. That resulted in rounding precision errors which causes an overlap between the species and vacancy concentration ranges when generating the enumlib input file. For example, with base 8 and a supercell size of 2, the input file generated has the range of 2-4/8 for Cu, and 3-5/8 for vacancies. That resulted in a 4/8 4/8 split satisfying the enumeration, which is a Cu3Te2.

The solution is to make sure the base is large enough. I basically just applied a 10x factor to the base. This is an arbitrary factor in any case, so it does not make all that much of a difference.

The issue is closed for now. But someone should implement the aforementioned sanity checks with composition tolerances for the generated structure as well.

PMG dictator #1 has now semi-retired....

mkhorton commented 6 years ago

Nice catch, and duly noted! Will add it to the list ...

shyuep commented 6 years ago

Note that the fixed EnumlibAdaptor caused the max_disordered test to fail. The problem is that the max disordered test was actually wrongly coded. There was a primitive cubic cell with Li 0.2, Na 0.2 and K 0.6 + 1O. If you order this with max_ordered = 5, you obviously must have a cell with 10 atoms. The test was checking that it has only 8 atoms!! The above fix actually solves this problem.

computron commented 5 years ago

A quick postscript: @utf will be looking into coding an alternate enumerator in pymatgen using the bsym (https://github.com/bjmorgan/bsym) library. According to Alex, this library might improve upon common problems with enumlib.