materialsproject / api

New API client for the Materials Project
https://materialsproject.github.io/api/
Other
105 stars 33 forks source link

Not able to export the data #889

Closed ak983819 closed 4 months ago

ak983819 commented 4 months ago

I want to extract all the compositions available on the material project with a band gap greater than 1.5 eV. I got the output but in the output file there was empty list of formulas. I got task ID as will but I have no idea how to get the formula from task ID. I have been trying to fix but always output is comping without the formula. Could you please help me extract all the compositions having band gap of more than 1.5 eV?

import requests import pandas as pd

Set the base URL for the Materials Project API

base_url = "https://api.materialsproject.org"

Define the endpoint for retrieving band structure data

endpoint = "/materials/electronic_structure/bandstructure/"

Define your API key

api_key = "" # Replace with your actual API key

Define the parameters for the band gap query

params = { "band_gap_min": 1.5, # Minimum value for the band gap "_fields": "pretty_formula,band_gap,material_id", # Fields to retrieve "_limit": 100, # Adjust based on how many results you want per request "_skip": 0 # For pagination }

Function to fetch data with parameters

def fetch_data(params): headers = {"X-API-KEY": api_key} all_data = [] batch_number = 0 # Keep track of how many batches have been fetched while True: response = requests.get(base_url + endpoint, params=params, headers=headers) if response.status_code == 200: data = response.json()["data"] if not data: break # Exit loop if no more data is returned all_data.extend(data) params["_skip"] += params["_limit"] # Prepare for the next batch of data batch_number += 1 print(f"Batch {batch_number} fetched. Total materials fetched: {len(all_data)}") else: print("Error fetching data:", response.status_code) break return all_data

Fetch all compositions with a band gap above 1.5 eV

data = fetch_data(params)

Convert the list of data to a pandas DataFrame

df = pd.DataFrame(data, columns=["pretty_formula", "task_id", "band_gap"])

Rename columns for clarity

df.columns = ["Formula", "Task ID", "Band Gap"]

print(df) image please help me....

munrojm commented 4 months ago

@ak983819 could I ask the reason for not using the python client?

ak983819 commented 4 months ago

I am using Jupyter Notebook. Let me know if this was not your question

munrojm commented 4 months ago

Have you tried using the python client instead of making direct HTTP requests? https://docs.materialsproject.org/downloading-data/using-the-api/querying-data

ak983819 commented 4 months ago

Got it! yes I have tried this one also

ak983819 commented 4 months ago

I dont understand why formula column is empty here in my output. or is there any way to get formula from the Task ID somehow

munrojm commented 4 months ago

If you just send requests to /materials/electronic_structure/ you should be able to add formula_pretty to _fields while still querying the band gap.

ak983819 commented 4 months ago

Yes it's working. But Formula name is not there. This is code I'm using right now import requests import pandas as pd base_url = "https://api.materialsproject.org" endpoint = "/materials/electronic_structure/" api_key = "****"

params = { "band_gap_min": 1.5, # Minimum value for the band gap "_fields": "pretty_formula,band_gap,material_id", # Fields to retrieve "_limit": 100, # Adjust based on how many results you want per request "_skip": 0 # For pagination } def fetch_data(params): headers = {"X-API-KEY": api_key} all_data = [] batch_number = 0 # Keep track of how many batches have been fetched while True: response = requests.get(base_url + endpoint, params=params, headers=headers) if response.status_code == 200: data = response.json()["data"] if not data: break # Exit loop if no more data is returned all_data.extend(data) params["_skip"] += params["_limit"] # Prepare for the next batch of data batch_number += 1 print(f"Batch {batch_number} fetched. Total materials fetched: {len(all_data)}") else: print("Error fetching data:", response.status_code) break return all_data data = fetch_data(params)

df = pd.DataFrame(data, columns=["pretty_formula", "material_id", "band_gap"]) df.columns = ["Formula", "material_id", "Band Gap"] print(df) This is snap of my output file image or Could you just tell me how to chemical formula from material Id , this will also work for me. And Thank you so much for quick response. I really appreciate it!

munrojm commented 4 months ago

Try 'formula_pretty' not 'pretty_formula'. Else, query '/materials/core/' using 'material_ids'

ak983819 commented 4 months ago

Thank you so much! It's working now