openclimatefix / Elexonpy

Python package wrapper around Elexon api
MIT License
12 stars 4 forks source link

Transfer opnapi.json to python functions. Maybe to start with everything is returned as pandas dataframe? See json file in main repo #3

Closed peterdudfield closed 3 months ago

peterdudfield commented 5 months ago

This could be done manually for each api endpoint. But I do like the idea of creating a function automaically from the openapi.json

Jacqueline-J commented 5 months ago

I've been working on this function to retrieve data from the API. Here's what I have so far. This is a test script that prints out the first 10 functions based on the OpenAPI specification:

import json
import requests
import pandas as pd
import itertools

# returns JSON object as
# a dictionary
with open(
    "/content/prod-insol-insights-api.json",
) as f:
    data = json.load(f)

def generate_function_from_openapi(openapi_json):
    # Get all path items
    path_items = openapi_json.get('paths', {})

    # Take only the first 5 paths
    first_5_paths = dict(itertools.islice(path_items.items(), 10))

    # Iterate over each path
    for path, methods in first_5_paths.items():
        # Iterate over HTTP methods 
        for method, details in methods.items():

            # Extract method details
            function_name = details.get('operationId', 'default_function').replace('-', '_')

            # Extract parameters
            parameters = details.get('parameters', [])

            # Construct function signature
            print(f"def {function_name}(params):")

            # Construct docstring
            print(f"    \"\"\"")
            print(f"    Fetches the {details.get('summary', 'No summary provided')}")
            print("")
            print(f"    Args:")
            for param in parameters:
                param_name = param['name']
                param_desc = param.get('description', '')
                format_enum = param['schema'].get('enum', [])
                format_enum_str = f" Use {', '.join(format_enum)}." if format_enum else ''
                final_desc = f"{param_name}: {param_desc}{format_enum_str}"
                print(f"        {final_desc}")
            print("")
            print(f"    Raises:")
            print(f"        Exception: If the request fails or returns a non-200 status code.")
            print(f"    \"\"\"")

            # Generate function body
            print(f"    base_url = '{openapi_json.get('servers', [{}])[0].get('url', '')}'")
            print(f"    url = base_url + '{path}'")
            print(f"    params['format'] = 'json'")
            print("    response = requests.get(url, params=params)")

            print("    if response.status_code == 200:")
            print("        data = response.json()")

            print("        # Flatten the top level")
            print("        df = pd.json_normalize(data)")

            print("        # explode the datacolumn")
            print("        df = df.explode('data')")

            print("        # Apply function to each row")
            print("        df = df.apply(extract_keys, axis=1)")

            print("        # Drop the original 'data' column")
            print("        df.drop(columns=['data'], inplace=True)")

            print("        # Merge the expanded columns back into the original DataFrame")
            print("        return df")

            print("    else:")
            print("        raise Exception(f'Error: {response.status_code}')")
            print("")

example usage

print(generate_function_from_openapi(data))

and the output will be

def get_generation_availability_summary_14d(params):
    """
    Fetches the Fourteen-day generation forecast

    Args:
        format: Response data format. Use json/xml to include metadata. Use json, xml, csv.

    Raises:
        Exception: If the request fails or returns a non-200 status code.
    """
    base_url = 'https://data.elexon.co.uk/bmrs/api/v1'
    url = base_url + '/generation/availability/summary/14D'
    params['format'] = 'json'
    response = requests.get(url, params=params)
    if response.status_code == 200:
        data = response.json()
        # Flatten the top level
        df = pd.json_normalize(data)
        # explode the datacolumn
        df = df.explode('data')
        # Apply function to each row
        df = df.apply(extract_keys, axis=1)
        # Drop the original 'data' column
        df.drop(columns=['data'], inplace=True)
        # Merge the expanded columns back into the original DataFrame
        return df
    else:
        raise Exception(f'Error: {response.status_code}')
# Function to dynamically extract keys
def extract_keys(row):
    for key, value in row.items():
        if isinstance(value, dict):
            for k, v in value.items():
                row[k] = v
            return row

params={'format':'json'}

get_generation_availability_summary_14d(params)

The function can then be copied and executed. I still need to test it on other endpoints because I believe some endpoints return data that is not nested JSON, so they don't need all the steps to convert it to a DataFrame. However, I wanted to know if I'm on the right track with this.

peterdudfield commented 5 months ago

Definately on the right track, see #4 as an example which was made manually.

I'd be tempted to try to write out the params in the function input and import base_url from a constant file, see #4

I wonder also, as well as printing, the text could be appended to a python file? This means potentially we could write all the api python code automatically

peterdudfield commented 3 months ago

Ill close this for now, we used Swagger Codegen in the end