OpenAPI Spec - reduce_openapi_spec - maximum recursion depth exceeded

Cyb3rWard0g commented 12 months ago

Issue you'd like to raise.

Hello team!

I am new to the OpenAPI Specs Toolkit. I am trying to build an OpenAPI Agent Planner for the Microsoft Graph API. The OpenAPI specs can be found here: https://github.com/microsoftgraph/msgraph-metadata/blob/master/openapi/v1.0/openapi.yaml . I downloaded the YAML file and followed this notebook: https://python.langchain.com/docs/integrations/toolkits/openapi#1st-example-hierarchical-planning-agent .

import os
import yaml
from langchain.agents.agent_toolkits.openapi.spec import reduce_openapi_spec

current_directory = os.path.abspath('')
data_directory = os.path.join(current_directory, "data")
msgraph_api_file = os.path.join(data_directory, "msgraph-openapi.yaml")

raw_msgraph_api_spec = yaml.load(open(msgraph_api_file,encoding='utf-8').read(), Loader=yaml.Loader)
msgraph_api_spec = reduce_openapi_spec(raw_msgraph_api_spec)

Does anyone know how to handle large OpenAPI Specs? I ran the following to read my OpenAPI YAML spec and when using the reduce_openapi_spec module, I get the following error below:

RecursionError: maximum recursion depth exceeded while calling a Python object

Is there a setting I need to change in LangChain? Please and thank you for your help in advance. I believe this issue is before I run into the following token limit, right? https://github.com/langchain-ai/langchain/issues/2786

Suggestion:

No response

dosubot[bot] commented 12 months ago

🤖

Hello,

Thank you for reaching out and providing detailed information about the issue you're facing. The RecursionError: maximum recursion depth exceeded while calling a Python object error you're encountering is likely due to the dereference_refs function, which is called within the reduce_openapi_spec function. If the OpenAPI spec has a deep structure, it could cause a recursion depth exceeded error.

To handle large OpenAPI Specs in LangChain without exceeding the maximum recursion depth, you can modify the reduce_openapi_spec function to use an iterative approach instead of a recursive one. Here's a general example of how you might convert a recursive function to an iterative one:

def recursive_function(node):
    if base_case(node):
        return some_value
    else:
        return recursive_function(next_node(node))

# can be converted to:

def iterative_function(start_node):
    stack = [start_node]
    while stack:
        node = stack.pop()
        if base_case(node):
            return some_value
        else:
            stack.append(next_node(node))

In this case, you would need to apply a similar transformation to the dereference_refs function. If you provide the implementation of dereference_refs, I can give a more specific solution.

I hope this helps! If you have any other questions or need further clarification, please don't hesitate to ask.

Best, Dosu

Sources

libs/langchain/langchain/agents/agent_toolkits/openapi/spec.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Cyb3rWard0g commented 12 months ago

I guess just to test the other capabilities under reduce_openapi_spec, if I set dereference to False in the function, this works:

msgraph_api_spec = reduce_openapi_spec(raw_msgraph_api_spec, dereference=False)

However, I believe now I have this massive spec with over 9K endpoints :) . Any help on how to handle those massive OpenAPI specs?

dosubot[bot] commented 8 months ago

Hi, @Cyb3rWard0g,

I'm helping the LangChain team manage their backlog and am marking this issue as stale. From what I understand, you encountered a "maximum recursion depth exceeded" error when using the reduce_openapi_spec module with a large OpenAPI YAML spec for the Microsoft Graph API. Dosu-bot provided a detailed response, suggesting a modification to the reduce_openapi_spec function to use an iterative approach instead of a recursive one to handle large OpenAPI specs. You later tested setting dereference to False in the function, which worked for a massive spec with over 9K endpoints, and requested further assistance on handling such large OpenAPI specs. Yazdipour reacted positively to your comment.

Is this issue still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.

Thank you for your understanding and contributions to LangChain! If you have any further questions or need assistance, feel free to reach out.

nischalj10 commented 8 months ago

Hey @Cyb3rWard0g were you able to resolve this or try out the solution by the bot? I am facing the same issue.

nischalj10 commented 8 months ago

@dosubot

In this case, you would need to apply a similar transformation to the dereference_refs function. If you provide the implementation of dereference_refs, I can give a more specific solution.

Here is the implementation of the dereference_refs in the /langchain_core/utils/json_schema.py file

from __future__ import annotations

from copy import deepcopy
from typing import Any, List, Optional, Sequence

def _retrieve_ref(path: str, schema: dict) -> dict:
    components = path.split("/")
    if components[0] != "#":
        raise ValueError(
            "ref paths are expected to be URI fragments, meaning they should start "
            "with #."
        )
    out = schema
    for component in components[1:]:
        if component.isdigit():
            out = out[int(component)]
        else:
            out = out[component]
    return deepcopy(out)

def _dereference_refs_helper(
    obj: Any, full_schema: dict, skip_keys: Sequence[str]
) -> Any:
    if isinstance(obj, dict):
        obj_out = {}
        for k, v in obj.items():
            if k in skip_keys:
                obj_out[k] = v
            elif k == "$ref":
                ref = _retrieve_ref(v, full_schema)
                return _dereference_refs_helper(ref, full_schema, skip_keys)
            elif isinstance(v, (list, dict)):
                obj_out[k] = _dereference_refs_helper(v, full_schema, skip_keys)
            else:
                obj_out[k] = v
        return obj_out
    elif isinstance(obj, list):
        return [_dereference_refs_helper(el, full_schema, skip_keys) for el in obj]
    else:
        return obj

def _infer_skip_keys(obj: Any, full_schema: dict) -> List[str]:
    keys = []
    if isinstance(obj, dict):
        for k, v in obj.items():
            if k == "$ref":
                ref = _retrieve_ref(v, full_schema)
                keys.append(v.split("/")[1])
                keys += _infer_skip_keys(ref, full_schema)
            elif isinstance(v, (list, dict)):
                keys += _infer_skip_keys(v, full_schema)
    elif isinstance(obj, list):
        for el in obj:
            keys += _infer_skip_keys(el, full_schema)
    return keys

def dereference_refs(
    schema_obj: dict,
    *,
    full_schema: Optional[dict] = None,
    skip_keys: Optional[Sequence[str]] = None,
) -> dict:
    """Try to substitute $refs in JSON Schema."""

    full_schema = full_schema or schema_obj
    skip_keys = (
        skip_keys
        if skip_keys is not None
        else _infer_skip_keys(schema_obj, full_schema)
    )
    return _dereference_refs_helper(schema_obj, full_schema, skip_keys)

thebhulawat commented 7 months ago

I noticed the same issue for my open api spec. I think this was an issue of circular dependency. With proposed changes here I was also able to run MS Graph openapi specs.

langchain-ai / langchain