Open JiQi535 opened 2 years ago
@JiQi535 thanks for posting. Your understanding of what's happening is 100% correct, and I hadn't realized that this wasn't documented (my apologies for that). The oxidation state guessing was a feature that we added somewhat at the last minute and the documentation must have slipped through the cracks. That said, you should be seeing a warning along the lines of "Failed to guess oxidation states..." as per this line. Were you not seeing that in your work?
I remember we did try a default value of max_sites=-1
but that led to unacceptable performance hangs with modestly large structures. My concern with setting to -1
is that a user might find process_entries
hanging and not be able to figure out why.
Note that any materials you obtain from MPRester()
will have oxidation states already populated, and hence the corrections will be applied correctly even for larger structures.
I would suggest a few things:
max_sites
a kwarg in the __init__
, keeping the default value at -20Let me know what you think about the above. If this sounds good to you I'd be happy to review a PR with the changes. Also flagging @mkhorton for comments.
Is your feature request related to a problem? Please describe. Yes. The current get_adjustments() function in MP2020Compatibility fails to get the correct energy adjustments for compositions with over 20 atoms and specific mixed anions, e.g. Li20O9Cl2. For Li20O9Cl2, it can only get the energy correction for O2-, while correction for Cl- is missed. https://github.com/materialsproject/pymatgen/blob/1f1056cab8b9358f4d1ac2012f2a5732cd29bcf1/pymatgen/entries/compatibility.py#L1041-L1047 It is because of using a magic number of "max_sites=-20" to set 20 as the cutoff for the maximum number of sites in a reduced formula whose oxidation states will be guessed. Together with the below codes, energy corrections for anions that are not the most electronegative are missed. https://github.com/materialsproject/pymatgen/blob/1f1056cab8b9358f4d1ac2012f2a5732cd29bcf1/pymatgen/entries/compatibility.py#L1061-L1071
Without documentation, this current setting can lead to hidden error when analysing the Ehull of materials with over 20 atoms and mixed anions, like missing Cl- in Li20O9Cl2 and missing N3- in Li20Cl17N.
Describe the solution you'd like We can set "max_sites=-1" so that this error no longer exist, while reduced formula will be tried to find so that composition will be as small as possible to keep performance. I believe that being correct is more important than having good speed, and some compromise in performance should be acceptable.
@rkingsbury Hey Ryan, I saw your helpful post about MP2020Compatibility before, so I'm quoting you here. Thanks in advance! 😁