materialsproject / api

New API client for the Materials Project
https://materialsproject.github.io/api/
Other
107 stars 39 forks source link

Problems retrieving TaskDocs for materials #761

Open keeganq opened 1 year ago

keeganq commented 1 year ago

I'm trying to retrieve charge density data, and the corresponding task information for the calculations that produced that data.

I'd like to be able to download the VASP input and output files associated with the volumetric charge density data for some materials.

Version Info

python==3.9.16
mp-api==0.30.10
pymatgen==2023.3.23
boto3=1.26.99
emmet-core==0.51.1

Reproduction

I'm trying to retrieve charge density for materials with inc_task_doc=True

from mp_api.client import MPRester

mpid = "mp-149"

with MPRester("<api_key>") as mpr:
    chgcar = mpr.get_charge_density_from_material_id(mpid, inc_task_doc=True) 

Produces output:


ValueError: No POTCAR for Si with functional PBE found. Please set the PMG_VASP_PSP_DIR environment in .pmgrc.yaml, or you may need to set PMG_DEFAULT_FUNCTIONAL to PBE_52 or PBE_54 if you are using newer psps from VASP.
Full Stack Trace ```python-traceback Retrieving MaterialsDoc documents: 100%|██████████| 1/1 [00:00<00:00, 27413.75it/s] Retrieving ChgcarDataDoc documents: 100%|██████████| 2/2 [00:00<00:00, 60787.01it/s] Retrieving ChgcarDataDoc documents: 100%|██████████| 1/1 [00:00<00:00, 25575.02it/s] --------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[8], line 2 1 with MPRester("") as mpr: ----> 2 chgcar = mpr.get_charge_density_from_material_id(mpid, inc_task_doc=True) # task=True ?? Look at github 3 # print(chgcar) File ~/.conda/envs/materials-project/lib/python3.9/site-packages/mp_api/client/mprester.py:1101, in MPRester.get_charge_density_from_material_id(self, material_id, inc_task_doc) 1098 raise MPRestError(f"No charge density fetched for {material_id}.") 1100 if inc_task_doc: -> 1101 task_doc = self.tasks.get_data_by_id(latest_doc.task_id) 1102 return chgcar, task_doc 1104 return chgcar File ~/.conda/envs/materials-project/lib/python3.9/site-packages/mp_api/client/core/client.py:839, in BaseRester.get_data_by_id(self, document_id, fields) 836 results = [] # type: List 838 try: --> 839 results = self._query_resource_data(criteria=criteria, fields=fields, suburl=document_id) # type: ignore 840 except MPRestError: 842 if self.primary_key == "material_id": 843 # see if the material_id has changed, perhaps a task_id was supplied 844 # this should likely be re-thought File ~/.conda/envs/materials-project/lib/python3.9/site-packages/mp_api/client/core/client.py:797, in BaseRester._query_resource_data(self, criteria, fields, suburl, use_document_model, timeout) 774 def _query_resource_data( 775 self, 776 criteria: Optional[Dict] = None, (...) 780 timeout: Optional[int] = None, 781 ) -> Union[List[T], List[Dict]]: 782 """ 783 Query the endpoint for a list of documents without associated meta information. Only 784 returns a single page of results. (...) 794 A list of documents 795 """ --> 797 return self._query_resource( # type: ignore 798 criteria=criteria, 799 fields=fields, 800 suburl=suburl, 801 use_document_model=use_document_model, 802 chunk_size=1000, 803 num_chunks=1, 804 ).get("data") File ~/.conda/envs/materials-project/lib/python3.9/site-packages/mp_api/client/core/client.py:295, in BaseRester._query_resource(self, criteria, fields, suburl, use_document_model, parallel_param, num_chunks, chunk_size, timeout) 292 if not url.endswith("/"): 293 url += "/" --> 295 data = self._submit_requests( 296 url=url, 297 criteria=criteria, 298 use_document_model=use_document_model, 299 parallel_param=parallel_param, 300 num_chunks=num_chunks, 301 chunk_size=chunk_size, 302 timeout=timeout, 303 ) 305 return data 307 except RequestException as ex: File ~/.conda/envs/materials-project/lib/python3.9/site-packages/mp_api/client/core/client.py:429, in BaseRester._submit_requests(self, url, criteria, use_document_model, parallel_param, num_chunks, chunk_size, timeout) 425 remaining_docs_avail = {} 427 initial_params_list = [{"url": url, "verify": True, "params": copy(crit)} for crit in new_criteria] --> 429 initial_data_tuples = self._multi_thread(use_document_model, initial_params_list) 431 for data, subtotal, crit_ind in initial_data_tuples: 433 subtotals.append(subtotal) File ~/.conda/envs/materials-project/lib/python3.9/site-packages/mp_api/client/core/client.py:634, in BaseRester._multi_thread(self, use_document_model, params_list, progress_bar, timeout) 630 finished, futures = wait(futures, return_when=FIRST_COMPLETED) 632 for future in finished: --> 634 data, subtotal = future.result() 636 if progress_bar is not None: 637 progress_bar.update(len(data["data"])) File ~/.conda/envs/materials-project/lib/python3.9/concurrent/futures/_base.py:439, in Future.result(self, timeout) 437 raise CancelledError() 438 elif self._state == FINISHED: --> 439 return self.__get_result() 441 self._condition.wait(timeout) 443 if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]: File ~/.conda/envs/materials-project/lib/python3.9/concurrent/futures/_base.py:391, in Future.__get_result(self) 389 if self._exception: 390 try: --> 391 raise self._exception 392 finally: 393 # Break a reference cycle with the exception in self._exception 394 self = None File ~/.conda/envs/materials-project/lib/python3.9/concurrent/futures/thread.py:58, in _WorkItem.run(self) 55 return 57 try: ---> 58 result = self.fn(*self.args, **self.kwargs) 59 except BaseException as exc: 60 self.future.set_exception(exc) File ~/.conda/envs/materials-project/lib/python3.9/site-packages/mp_api/client/core/client.py:685, in BaseRester._submit_request_and_process(self, url, verify, params, use_document_model, timeout) 682 if response.status_code == 200: 684 if self.monty_decode: --> 685 data = json.loads(response.text, cls=MontyDecoder) 686 else: 687 data = json.loads(response.text) File ~/.conda/envs/materials-project/lib/python3.9/json/__init__.py:359, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw) 357 if parse_constant is not None: 358 kw['parse_constant'] = parse_constant --> 359 return cls(**kw).decode(s) File ~/.conda/envs/materials-project/lib/python3.9/site-packages/monty/json.py:475, in MontyDecoder.decode(self, s) 473 else: 474 d = json.loads(s) --> 475 return self.process_decoded(d) File ~/.conda/envs/materials-project/lib/python3.9/site-packages/monty/json.py:454, in MontyDecoder.process_decoded(self, d) 451 elif (bson is not None) and modname == "bson.objectid" and classname == "ObjectId": 452 return bson.objectid.ObjectId(d["oid"]) --> 454 return {self.process_decoded(k): self.process_decoded(v) for k, v in d.items()} 456 if isinstance(d, list): 457 return [self.process_decoded(x) for x in d] File ~/.conda/envs/materials-project/lib/python3.9/site-packages/monty/json.py:454, in (.0) 451 elif (bson is not None) and modname == "bson.objectid" and classname == "ObjectId": 452 return bson.objectid.ObjectId(d["oid"]) --> 454 return {self.process_decoded(k): self.process_decoded(v) for k, v in d.items()} 456 if isinstance(d, list): 457 return [self.process_decoded(x) for x in d] File ~/.conda/envs/materials-project/lib/python3.9/site-packages/monty/json.py:457, in MontyDecoder.process_decoded(self, d) 454 return {self.process_decoded(k): self.process_decoded(v) for k, v in d.items()} 456 if isinstance(d, list): --> 457 return [self.process_decoded(x) for x in d] 459 return d File ~/.conda/envs/materials-project/lib/python3.9/site-packages/monty/json.py:457, in (.0) 454 return {self.process_decoded(k): self.process_decoded(v) for k, v in d.items()} 456 if isinstance(d, list): --> 457 return [self.process_decoded(x) for x in d] 459 return d File ~/.conda/envs/materials-project/lib/python3.9/site-packages/monty/json.py:454, in MontyDecoder.process_decoded(self, d) 451 elif (bson is not None) and modname == "bson.objectid" and classname == "ObjectId": 452 return bson.objectid.ObjectId(d["oid"]) --> 454 return {self.process_decoded(k): self.process_decoded(v) for k, v in d.items()} 456 if isinstance(d, list): 457 return [self.process_decoded(x) for x in d] File ~/.conda/envs/materials-project/lib/python3.9/site-packages/monty/json.py:454, in (.0) 451 elif (bson is not None) and modname == "bson.objectid" and classname == "ObjectId": 452 return bson.objectid.ObjectId(d["oid"]) --> 454 return {self.process_decoded(k): self.process_decoded(v) for k, v in d.items()} 456 if isinstance(d, list): 457 return [self.process_decoded(x) for x in d] File ~/.conda/envs/materials-project/lib/python3.9/site-packages/monty/json.py:454, in MontyDecoder.process_decoded(self, d) 451 elif (bson is not None) and modname == "bson.objectid" and classname == "ObjectId": 452 return bson.objectid.ObjectId(d["oid"]) --> 454 return {self.process_decoded(k): self.process_decoded(v) for k, v in d.items()} 456 if isinstance(d, list): 457 return [self.process_decoded(x) for x in d] File ~/.conda/envs/materials-project/lib/python3.9/site-packages/monty/json.py:454, in (.0) 451 elif (bson is not None) and modname == "bson.objectid" and classname == "ObjectId": 452 return bson.objectid.ObjectId(d["oid"]) --> 454 return {self.process_decoded(k): self.process_decoded(v) for k, v in d.items()} 456 if isinstance(d, list): 457 return [self.process_decoded(x) for x in d] File ~/.conda/envs/materials-project/lib/python3.9/site-packages/monty/json.py:427, in MontyDecoder.process_decoded(self, d) 425 data = {k: v for k, v in d.items() if not k.startswith("@")} 426 if hasattr(cls_, "from_dict"): --> 427 return cls_.from_dict(data) 428 if pydantic is not None and issubclass(cls_, pydantic.BaseModel): # pylint: disable=E1101 429 return cls_(**data) File ~/.conda/envs/materials-project/lib/python3.9/site-packages/pymatgen/io/vasp/inputs.py:2262, in Potcar.from_dict(cls, d) 2256 @classmethod 2257 def from_dict(cls, d): 2258 """ 2259 :param d: Dict representation 2260 :return: Potcar 2261 """ -> 2262 return Potcar(symbols=d["symbols"], functional=d["functional"]) File ~/.conda/envs/materials-project/lib/python3.9/site-packages/pymatgen/io/vasp/inputs.py:2243, in Potcar.__init__(self, symbols, functional, sym_potcar_map) 2241 self.functional = functional 2242 if symbols is not None: -> 2243 self.set_symbols(symbols, functional, sym_potcar_map) File ~/.conda/envs/materials-project/lib/python3.9/site-packages/pymatgen/io/vasp/inputs.py:2339, in Potcar.set_symbols(self, symbols, functional, sym_potcar_map) 2337 else: 2338 for el in symbols: -> 2339 p = PotcarSingle.from_symbol_and_functional(el, functional) 2340 self.append(p) File ~/.conda/envs/materials-project/lib/python3.9/site-packages/pymatgen/io/vasp/inputs.py:1897, in PotcarSingle.from_symbol_and_functional(symbol, functional) 1895 d = SETTINGS.get("PMG_VASP_PSP_DIR") 1896 if d is None: -> 1897 raise ValueError( 1898 f"No POTCAR for {symbol} with functional {functional} found. Please set the PMG_VASP_PSP_DIR " 1899 "environment in .pmgrc.yaml, or you may need to set PMG_DEFAULT_FUNCTIONAL to PBE_52 or " 1900 "PBE_54 if you are using newer psps from VASP." 1901 ) 1902 paths_to_try = [ 1903 os.path.join(d, funcdir, f"POTCAR.{symbol}"), 1904 os.path.join(d, funcdir, symbol, "POTCAR"), 1905 ] 1906 for p in paths_to_try: ValueError: No POTCAR for Si with functional PBE found. Please set the PMG_VASP_PSP_DIR environment in .pmgrc.yaml, or you may need to set PMG_DEFAULT_FUNCTIONAL to PBE_52 or PBE_54 if you are using newer psps from VASP. ```

Looking through the stack trace, it looks like the api is trying to retrieve the docs associated with the latest task, and is unable to locate some vasp files for that task.

To get around this issue, I also tried retrieving ALL of the download information for this material:

with MPRester("<api-key>") as mpr:
    data = mpr.get_download_info(material_ids=["mp-149"])

And received the output (task docs metadata, task NOMAD url where it exists):

({MPID(mp-149): [{'task_id': 'mp-655585',
    'calc_type': <CalcType.GGA_Static: 'GGA Static'>},
   {'task_id': 'mp-656511',
    'calc_type': <CalcType.GGA_NSCF_Line: 'GGA NSCF Line'>},
   {'task_id': 'mp-655936',
    'calc_type': <CalcType.GGA_NSCF_Uniform: 'GGA NSCF Uniform'>},
   {'task_id': 'mp-11721',
    'calc_type': <CalcType.GGA_Structure_Optimization: 'GGA Structure Optimization'>},
   {'task_id': 'mp-149',
    'calc_type': <CalcType.GGA_Structure_Optimization: 'GGA Structure Optimization'>},
   {'task_id': 'mp-1057373', 'calc_type': <CalcType.GGA_Static: 'GGA Static'>},
   {'task_id': 'mp-1057366',
    'calc_type': <CalcType.GGA_Structure_Optimization: 'GGA Structure Optimization'>},
   {'task_id': 'mp-1057380',
    'calc_type': <CalcType.GGA_NSCF_Uniform: 'GGA NSCF Uniform'>},
   {'task_id': 'mp-1059585',
    'calc_type': <CalcType.GGA_Structure_Optimization: 'GGA Structure Optimization'>},
   {'task_id': 'mp-1059589', 'calc_type': <CalcType.GGA_Static: 'GGA Static'>},
   {'task_id': 'mp-1059603',
    'calc_type': <CalcType.GGA_NSCF_Uniform: 'GGA NSCF Uniform'>},
   {'task_id': 'mp-1120258',
    'calc_type': <CalcType.GGA_Structure_Optimization: 'GGA Structure Optimization'>},
   {'task_id': 'mp-1120259',
    'calc_type': <CalcType.GGA_Structure_Optimization: 'GGA Structure Optimization'>},
   {'task_id': 'mp-1141021',
    'calc_type': <CalcType.GGA_DFPT_Dielectric: 'GGA DFPT Dielectric'>},
   {'task_id': 'mp-1248038',
    'calc_type': <CalcType.GGA_Structure_Optimization: 'GGA Structure Optimization'>},
   {'task_id': 'mp-1249516',
    'calc_type': <CalcType.GGA_NMR_Electric_Field_Gradient: 'GGA NMR Electric Field Gradient'>},
   {'task_id': 'mp-1267607',
    'calc_type': <CalcType.GGA_NMR_Nuclear_Shielding: 'GGA NMR Nuclear Shielding'>},
   {'task_id': 'mp-1440634', 'calc_type': <CalcType.GGA_Static: 'GGA Static'>},
   {'task_id': 'mp-1686587',
    'calc_type': <CalcType.GGA_NSCF_Uniform: 'GGA NSCF Uniform'>},
   {'task_id': 'mp-1791788', 'calc_type': <CalcType.GGA_Static: 'GGA Static'>},
   {'task_id': 'mp-1594776',
    'calc_type': <CalcType.GGA_NSCF_Line: 'GGA NSCF Line'>},
   {'task_id': 'mp-1592727',
    'calc_type': <CalcType.GGA_NSCF_Line: 'GGA NSCF Line'>},
   {'task_id': 'mp-1947498',
    'calc_type': <CalcType.R2SCAN_Structure_Optimization: 'R2SCAN Structure Optimization'>},
   {'task_id': 'mp-1950734',
    'calc_type': <CalcType.PBESol_Structure_Optimization: 'PBESol Structure Optimization'>},
   {'task_id': 'mp-1059604',
    'calc_type': <CalcType.GGA_NSCF_Line: 'GGA NSCF Line'>},
   {'task_id': 'mp-1057384',
    'calc_type': <CalcType.GGA_NSCF_Line: 'GGA NSCF Line'>},
   {'task_id': 'mp-1536661',
    'calc_type': <CalcType.SCAN_Structure_Optimization: 'SCAN Structure Optimization'>},
   {'task_id': 'mp-2250750',
    'calc_type': <CalcType.GGA_NSCF_Uniform: 'GGA NSCF Uniform'>},
   {'task_id': 'mp-2299819',
    'calc_type': <CalcType.HSE06_Static: 'HSE06 Static'>},
   {'task_id': 'mp-2291052', 'calc_type': <CalcType.GGA_Static: 'GGA Static'>},
   {'task_id': 'mp-2683378',
    'calc_type': <CalcType.GGA_Structure_Optimization: 'GGA Structure Optimization'>}]},
 ['https://nomad-lab.eu/prod/rae/api/raw/query?external_id=mp-11721',
  'https://nomad-lab.eu/prod/rae/api/raw/query?external_id=mp-149',
  'https://nomad-lab.eu/prod/rae/api/raw/query?external_id=mp-1057366',
  'https://nomad-lab.eu/prod/rae/api/raw/query?external_id=mp-1057380',
  'https://nomad-lab.eu/prod/rae/api/raw/query?external_id=mp-1059585',
  'https://nomad-lab.eu/prod/rae/api/raw/query?external_id=mp-1059589',
  'https://nomad-lab.eu/prod/rae/api/raw/query?external_id=mp-1059604',
  'https://nomad-lab.eu/prod/rae/api/raw/query?external_id=mp-1057384'])

So I can get the task info this way, it's not clear which of these calculations is associated with the material's charge density data.

The two questions I have:

  1. Is the ValueError seen with the get_charge_density_from_material_id method a bug?
  2. Is there a way to find the task_id that produced the charge density data for any one material? Then I could download the VASP files associated with that task.

Thanks for any help you can offer!

munrojm commented 1 year ago

@keeganq, thanks for reporting this issue. This is happening as the API client by default tries to deserialize data into appropriate pymatgen objects. Since you have pymatgen installed but do not have the POTCAR configuration fully functional, it is giving you problems. This is something we are aware of on our end, and are planning a couple of different changes to fix it. For now, the easiest thing to do would be to pass monty_decode=False to MPRester alongside your API key. This should disable all deserialization by the client.

Additionally, I have just realized that the latest changes to the TaskDoc model in emmet-core have broken pulling task data through the API. I have just pinned emmet-core<=0.50.0, and have patch released to mp-api==0.30.11. Before pulling data, I would update your installation of both packages.

keeganq commented 1 year ago

Thanks @munrojm! This is looking much better now. I am able to retrieve a TaskDoc with get_charge_density_from_material_id(<mpid>, inc_task_doc=True). Would it be safe to assume that this TaskDoc is the one that is associated with the calculations used for the volumetric charge density data?

munrojm commented 1 year ago

Yup! That is correct. The CHGCAR is taken from that specific calculation.

keeganq commented 1 year ago

An update on this: I was able to configure pymatgen with a local set of POTCAR files, and was previously able to retrieve TaskDocs with monty_decode=True in MPRester, as you suggested. These TaskDocs would have decoded objects, specifically TaskDoc.orig_inputs.potcar would be a list of pymatgen.io.vasp.inputs.PotcarSingle objects.

Unfortunately, this isn't working after some recent changes to the API. The potcar is instead returned as an emmet Potcar object, i.e. it was not decoded. I think I've identified the problem, and it looks very intentional:

https://github.com/materialsproject/api/blob/3ffecd21a859d8a9314ce64faa0d76c15ad29c5c/mp_api/client/mprester.py#L216-L218

Assuming that this behavior was intended, is there a new recommended way to decode objects in the TaskDoc?

Thanks as always for your help!

munrojm commented 1 year ago

I've actually default disabled Monty decoding for the task endpoint while we get a better solution for this. You can instead pass the data to the process decoded method of 'MontyDecoder' to manually decode. Instantiating the 'TaskDoc' with the data as input arguments should also decode any data that isn't nested using monty.