materialsproject / api

New API client for the Materials Project
https://materialsproject.github.io/api/
Other
105 stars 33 forks source link

task type missing from api call #871

Open bernstei opened 7 months ago

bernstei commented 7 months ago

When I retrieve task mp-1591459 via the API using

task_id = "mp-1591459"
with MPRester(api_key) as mpr:
    task_result = mpr.materials.tasks.get_data_by_id(task_id).dict()
    print(task_result)

and check the output for "NSCF", I get nothing (other tasks IDs do contain a top level task_type field with apparently meaningful info):

tin 3562 : python3 t.py | grep NSCF
/home/cluster2/bernstei/.local/lib/python3.9/site-packages/mp_api/client/mprester.py:230: UserWarning: mpcontribs-client not installed. Install the package to query MPContribs data, or construct pourbaix diagrams: 'pip install mpcontribs-client'
  warnings.warn(
Retrieving TaskDoc documents: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 25115.59it/s]
tin 3563 : echo $?
1

However, when I use the web interface to the tasks at https://next-gen.materialsproject.org/materials/mp-12751/tasks/mp-1591459 I get the attached screenshot, which indicates that it's a GGA NSCF Uniform task type. Why is this information missing from the API call's results?

Screenshot 2023-11-27 at 4 31 17 PM
bernstei commented 7 months ago

I looked a bit more carefully at the output of print(json.dumps(task_result, indent=2)), and I see that "task_type" is listed under "fields_not_requested" even if I explicitly pass it in fields to the get_data_by_id call.

Here's an example

tin 3903 : python3 t.py mp-1591459
Retrieving TaskDoc documents: 100%|███████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 41120.63it/s]
None
True
tin 3904 : cat t.py 
import sys
import os
import json

from mp_api.client import MPRester

api_key = os.getenv("MP_API_KEY")

task_id = sys.argv[1]
with MPRester(api_key) as mpr:
    task_result = mpr.materials.tasks.get_data_by_id(task_id, fields=["task_type", "input"])
    print(task_result.task_type)
    print("task_type" in task_result.fields_not_requested)
bernstei commented 7 months ago

@munrojm any ideas on this?

munrojm commented 7 months ago

@bernstei yes, sorry for the delayed response. The task schema is a little confusing, and doesn't always contain a granular description of the calculation since it is essentially just the parsed raw data. Additionally, the task collection is the only one where the schema is not fully consistent due to continuous additions from a decade of DFT calculations. For a detailed description of the type, you might want to pull the materials document that contains that task. You can search on task_ids and ask for task_types there. That should get you what you need.

bernstei commented 7 months ago

Thanks. I was able to extract the task type from the material record and match it to the corresponding task data.