Closed mfkoerner closed 1 year ago
@dwinston Pls fix in the Database?
FYI @mfkoerner, you have your private API key in the example code. There's also a reciprocal_lattice
method on Lattice
, so structure.lattice.reciprocal_lattice
should give the reciprocal lattice directly -- this seems to agree with your reciprocal()
function, but mentioning in case it's useful in future.
Regarding the error itself, I'm just noting mp-764505
as an example to investigate further.
Responding to @mkhorton , I fixed up my report a bit to fix the API issue and include some results I found. Most notably, I have percentages of the type of error and example materials for each style of error.
So, as far as I can tell, the order of the reciprocal vectors in bandstructure.lattice_rec
are different than those from my reciprocal()
function and I don't know a consistent way to properly match the lattice vectors in bandstructure.lattice_rec
to the locations of the covalent band minimum (CBM) and valence band maximums (VMB) in the bandstructure itself.
It would be extremely useful to me if anyone has a way to get the cartesian reciprocal space coordinates of the CBM and VBM correctly while this error persists. The proper way to do this if the error is fixed is BandStructure.get_cbm()['kpoint'].cart_coords
, but this is unfortunately impacted by the error.
Thanks very much for the extra detail @mfkoerner, this is very useful. Unfortunately, I don't (personally) know a way to get the information you want via the API until we've tracked down the root cause, but we will investigate the issue asap.
I'd like to echo this issue. The information about the reciprocal lattice seems to quite frequently be materially different between the reciprocal lattice stored/calculated in the "structure" object and the reciprocal lattice stored/calculated in the "band structure" object of the electronic structure module. Like, @mfkoerner above, I've conducted a number of tests, including calculating the reciprocal lattice on my end from the real lattice stored in the "structure" object, the result of which always properly matches the reciprocal lattice from the "structure" object.
We are doing some analysis of the electronic structure which involves looking at absolute crystal momentum differences between various points in the BZ, so resolving these issues is essential for us. The issues are 2-fold:
There are additionally many examples where the reciprocal lattices from the two database modules ("structure" and "band structure" objects) are rotated in cartesian space relative to one another, though this presents no problem as it is essentially just a choice of coordinates, and only differences in crystal momentum should matter for most analyses.
It seems that were it not for issue (2), a workaround for issue (1) would be easy, by simply querying the "correct" reciprocal lattice from the "structure" object. However, if the lattice vectors are cycled, or for lower symmetry crystal systems, a different choice of angles is made, it's unclear how to correctly map points from the band structure k-point path to the correct absolute crystal momentum. If we are correct about these issues, I also suspect it would be better to fix the issues in the database, rather than make a bespoke workaround on our end.
As a quick illustration of issue (1), plots of the BZ volumes from the two different locations in the database are below. [Note that these plots are only for the ~20k semiconductors (E_g < 2.5 eV) for which the database has a calculated bandstructure.]
Any help you can offer is greatly appreciated.
Thanks, Doug
Just to update, we believe we've identified (or partly identified) the issue, but the fix is going to take some time to address. Once this has been resolved, we'll write up a full description here of what happened and what's been changed as a result to ensure consistency in future. Many thanks both for the detailed bug report and for your patience while we address this.
@mkhorton Thanks very much. Any information you can provide about the rough timeline for fixing this would be extremely helpful for our project planning purposes.
(Caveat: we're still investigating this issue, so the following is subject to change.)
Ok, to provide a little more detail... Essentially, the structure optimization calculation and the band structure calculation are essentially two separate (though related) calculations, and the 'structure' reported by the API is from the former and the band structure from the latter. While it should be the case that both structures should match each other, clearly in some instances this is not the case. This is due to a combination of issues, partly involving faulty logic, and partly involving some stale data. A robust fix needs to address this, and should also include new database validation logic to ensure any similar issues are caught in future. As such, it will take some time to address. We're actually in the process of moving to a new database build system in part to avoid issues such as this.
However, as a stop-gap, it may be possible to extract the correct band structure object from the API. This is slightly convoluted since the API wasn't designed for this use, but the following code should work:
from pymatgen import MPRester, Structure
mpid = 'mp-764505' # mpid of interest
run_type = 'GGA' # material-dependent, can also be 'GGA+U'
mpr = MPRester()
data = mpr.query({'task_id':mpid},['band_structure'], mp_decode=False)
task_id = data[0]['band_structure'][run_type]['task_id']
task_data = mpr.get_task_data(task_id)
band_structure_structure = Structure.from_str(task_data[0]['cif'], fmt='cif')
band_structure_lattice = band_structure_structure.lattice
I believe this should work, and hope it's useful as an interim solution.
Hi Matthew,
Francesco (here in cc) could help on this if needed.
Geoffroy
On 16 Feb 2018, at 20:38, Matthew Horton notifications@github.com wrote:
(Caveat: we're still investigating this issue, so the following is subject to change.)
Ok, to provide a little more detail... Essentially, the structure optimization calculation and the band structure calculation are essentially two separate (though related) calculations, and the 'structure' reported by the API is from the former and the band structure from the latter. While it should be the case that both structures should match each other, clearly in some instances this is not the case. This is due to a combination of issues, partly involving faulty logic, and partly involving some stale data. A robust fix needs to address this, and should also include new database validation logic to ensure any similar issues are caught in future. As such, it will take some time to address. We're actually in the process of moving to a new database build system in part to avoid issues such as this.
However, as a stop-gap, it may be possible to extract the correct band structure object from the API. This is slightly convoluted since the API wasn't designed for use, but the following code should work:
from pymatgen import MPRester, Structure
mpid = 'mp-764505' # mpid of interest run_type = 'GGA' # material-dependent
mpr = MPRester()
data = mpr.query({'task_id':mpid},['band_structure'], mp_decode=False) task_id = data[0]['band_structure'][run_type]['task_id'] task_data = mpr.get_task_data(task_id) band_structure_structure = Structure.from_str(task_data[0]['cif'], fmt='cif') band_structure_lattice = band_structure_structure.lattice I believe this should work, and hope it's useful as an interim solution.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/materialsproject/pymatgen/issues/1031#issuecomment-366337635, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3l9_lGHjq6nohYlA_OU3MKDoWITQVoks5tVdlDgaJpZM4R4VsI.
Thanks, Matthew, this is very helpful. Just to give you a bit of additional information in case it helps on your end, the workaround certainly fixes the issue for many compounds. However, some are off by small but meaningful amounts, very small amounts (perhaps just different relaxation parameters?), or by a factor of 2 or 4. I haven't dug into the last issue much yet, my first thought was perhaps these were magnetically ordered supercells for the electronic structure calculations, but that would make the BZ smaller rather than larger...do you have any insight on this? The entries with this issue number only in the hundreds, but it would be nice not to have exclude them on this basis from our analysis.
Here are two quick figures, again for ~19,000 semiconductors with E_g < 2.5 eV.
Hi again!
@dhfabini, @mfkoerner and I have made a good deal of progress with the workaround offered by @mkhorton, but discovered this month (07/18) that it no longer seems to work. We have four computers with separately installed versions of python + pymatgen and the line:
task_data = mpr.get_task_data(task_id)
no longer seems to work on any of them. For example, when called on Si (mp-149), I get:
pymatgen.ext.matproj.MPRestError: 'NoneType' object is not iterable. Content: b'{"valid_response": false, "error": "\'NoneType\' object is not iterable", "version": {"db": "3.0.0", "pymatgen": "2018.6.11", "rest": "2.0"}, "created_at": "2018-07-06T15:59:31.330706", "traceback": "Traceback (most recent call last):\\n File \\"/var/www/python/matgen_prod/materials_django/rest/rest.py\\", line 94, in wrapped\\n d = func(*args, **kwargs)\\n File \\"/var/www/python/matgen_prod/materials_django/tasks/rest.py\\", line 47, in get_property\\n entries = qe.get_entries(crit, False, supported_task_properties)\\n File \\"/var/www/miniconda3/envs/mpprod3/lib/python3.6/site-packages/matgendb/query_engine.py\\", line 305, in get_entries\\n symbols = [\\"{} {}\\".format(func, label) for label in labels]\\nTypeError: \'NoneType\' object is not iterable\\n"}'
Is there an easy way around this? Thanks again!
Closing as stale.
System
Summary
When using MPRester('API_key').get_bandstructure_by_material_id('mpid') to get BandStructureSymmLine objects from the database, many of them (~50%) have incorrect reciprocal lattices. Of those incorrect, about half of the incorrect ones are off by a scalar factor of 2pi (as in b1 = 4pi^2 (a2 x a3) / (a1 . a2 x a3) instead of the correct 2pi term). Some of them are off in angle but have the correct volume. Others are also off in volume.
Example code
No error message, just wrong results about half the time
Testing run on error
Ran a census on the database and found the following:
Good: 47.2% Volume good: 18.4% 2pi per vector: 21.8% other: 12.7%
These data are collected from a census on the materials project bandstructures There are a total of 52857 materials considered here The data were taken from the database on July 14 2017
"Good" means that all lattice vectors and angles are correct to within 0.1 inverse angstroms and 0.1 degrees "Volume good" means that the lattice volume is correct to within 0.01 inverse angstroms cubed but the material is not "Good" "2pi per vector" means that the vectors are off by a factor of 2pi but the angles are correct. "Other" is any material that doesn't fit into the three previously mentioned categories.
Materials with different types of errors:
no materials are off by 4pi^2, 8pi^3, or 8pi per lattice vector 1 material is off by 2 (
mp-20557
) 4 materials were off by 4pi (mp-10137
,mp-1576
,mp-583266
,mp-685055
)5 example materials for each category: good:
mp-13981
mp-1008280
mp-11022
mp-763833
mp-770501
volume good:mp-989622
mp-754192
mp-643903
mp-1008601
mp-556356
2pi per vector:mp-626542
mp-4308
mp-31412
mp-773060
mp-11271
other:mp-773764
mp-705574
mp-772309
mp-505584
mp-767321
(note that many of these are off by nearly 2pi per vector but not exactly)Files (if any)
no files