patterns-ai-core / langchainrb

Build LLM-powered applications in Ruby
https://rubydoc.info/gems/langchainrb
MIT License
1.3k stars 181 forks source link

Improve LLM model tool annotation #659

Closed dghirardo closed 2 months ago

dghirardo commented 3 months ago

Issue #637

This pull request implements the tool_tailor gem (https://github.com/kieranklaassen/tool_tailor) to automate the generation of tool annotations.

Changes:

andreibondarev commented 3 months ago

Before:

irb(main):003> Langchain::Tool::NewsRetriever.new(api_key: ENV["NEWS_API_KEY"]).to_openai_tools
=>
[{"type"=>"function",
  "function"=>
   {"name"=>"news_retriever__get_everything",
    "description"=>"News Retriever: Search through millions of articles from over 150,000 large and small news sources and blogs.",
    "parameters"=>
     {"type"=>"object",
      "properties"=>
       {"q"=>
         {"type"=>"string",
          "description"=>
           "Keywords or phrases to search for in the article title and body. Surround phrases with quotes (\") for exact match. Alternatively you can use the AND / OR / NOT keywords, and optionally group these with parenthesis. Must be URL-encoded."},
        "search_in"=>{"type"=>"string", "description"=>"The fields to restrict your q search to.", "enum"=>["title", "description", "content"]},
        "sources"=>
         {"type"=>"string",
          "description"=>"A comma-seperated string of identifiers (maximum 20) for the news sources or blogs you want headlines from. Use the /sources endpoint to locate these programmatically or look at the sources index."},
        "domains"=>{"type"=>"string", "description"=>"A comma-seperated string of domains (eg bbc.co.uk, techcrunch.com, engadget.com) to restrict the search to."},
        "exclude_domains"=>{"type"=>"string", "description"=>"A comma-seperated string of domains (eg bbc.co.uk, techcrunch.com, engadget.com) to remove from the results."},
        "from"=>{"type"=>"string", "description"=>"A date and optional time for the oldest article allowed. This should be in ISO 8601 format."},
        "to"=>{"type"=>"string", "description"=>"A date and optional time for the newest article allowed. This should be in ISO 8601 format."},
        "language"=>{"type"=>"string", "description"=>"The 2-letter ISO-639-1 code of the language you want to get headlines for.", "enum"=>["ar", "de", "en", "es", "fr", "he", "it", "nl", "no", "pt", "ru", "sv", "ud", "zh"]},
        "sort_by"=>{"type"=>"string", "description"=>"The order to sort the articles in.", "enum"=>["relevancy", "popularity", "publishedAt"]},
        "page_size"=>{"type"=>"integer", "description"=>"The number of results to return per page (request). 5 is the default, 100 is the maximum."},
        "page"=>{"type"=>"integer", "description"=>"Use this to page through the results if the total results found is greater than the page size."}}}}},

After:

irb(main):013> Langchain::Tool::NewsRetriever.new(api_key: ENV["NEWS_API_KEY"]).to_openai_tools
=>
[{"type"=>"function",
  "function"=>
   {"name"=>"news_retriever__get_everything",
    "description"=>"Retrieve all news",
    "parameters"=>
     {"type"=>"object",
      "properties"=>
       {"q"=>{"type"=>"string", "description"=>"Keywords or phrases to search for in the article title and body."},
        "search_in"=>{"type"=>"string", "description"=>"The fields to restrict your q search to. The possible options are: title, description, content."},
        "sources"=>
         {"type"=>"string",
          "description"=>"A comma-seperated string of identifiers (maximum 20) for the news sources or blogs you want headlines from. Use the /sources endpoint to locate these programmatically or look at the sources index."},
        "domains"=>{"type"=>"string", "description"=>"A comma-seperated string of domains (eg bbc.co.uk, techcrunch.com, engadget.com) to restrict the search to."},
        "exclude_domains"=>{"type"=>"string", "description"=>"A comma-seperated string of domains (eg bbc.co.uk, techcrunch.com, engadget.com) to remove from the results."},
        "from"=>{"type"=>"string", "description"=>"A date and optional time for the oldest article allowed. This should be in ISO 8601 format."},
        "to"=>{"type"=>"string", "description"=>"A date and optional time for the newest article allowed. This should be in ISO 8601 format."},
        "language"=>{"type"=>"string", "description"=>"The 2-letter ISO-639-1 code of the language you want to get headlines for. Possible options: ar, de, en, es, fr, he, it, nl, no, pt, ru, se, ud, zh."},
        "sort_by"=>{"type"=>"string", "description"=>"The order to sort the articles in. Possible options: relevancy, popularity, publishedAt."},
        "page_size"=>{"type"=>"integer", "description"=>"The number of results to return per page. 20 is the API's default, 100 is the maximum. Our default is 5."},
        "page"=>{"type"=>"integer", "description"=>"Use this to page through the results."}},
      "required"=>[]}}},

Upon a quick glance -- the enum: param is missing. I wonder if we can annotate enum options in YARD docs 🤔

andreibondarev commented 3 months ago

@dghirardo Btw -- @palladius told me that you guys met at the RubyDay! 🫶

dghirardo commented 3 months ago

@andreibondarev I am currently working on a pull request for the tool_tailor gem to add support for the enum parameter. What do you think about the proposed solution?

dghirardo commented 3 months ago

Hi @andreibondarev, tool_tailor has merged my pull request, so now the enum property is supported, updating the gem. However, I noticed another issue. When you have a parameter that is a collection (like an array or a hash), you can't describe its internal structure.

Example (Tavily tool):

# @param exclude_domains [Array<String>] A list of domains to specifically exclude from the search results. Default is None, which doesn't exclude any domains.

While [Array] is supported, [Array<String>] is not. So, it’s not possible to:

You can only use the description field.

Even if we find a solution, I think nested parameters with multiple levels are not easy to describe with YARD comments.

andreibondarev commented 3 months ago

Hi @andreibondarev, tool_tailor has merged my pull request, so now the enum property is supported, updating the gem. However, I noticed another issue. When you have a parameter that is a collection (like an array or a hash), you can't describe its internal structure.

Example (Tavily tool):

# @param exclude_domains [Array<String>] A list of domains to specifically exclude from the search results. Default is None, which doesn't exclude any domains.

While [Array] is supported, [Array<String>] is not. So, it’s not possible to:

  • Define that the array items must be Strings.
  • Describe extra properties for the String values, such as enum.

You can only use the description field.

Even if we find a solution, I think nested parameters with multiple levels are not easy to describe with YARD comments.

Hmm... I think we may want to consider some sort of a decorator pattern, a la:

llm_callable :get_everything,
  description: "News Retriever: Search through millions of articles from over 150,000 large and small news sources and blogs",
  properties: {
    q: {
      type: :string,
      description: "Keywords or phrases to search for in the article title and body. Surround phrases with quotes (\") for exact match. Alternatively you can use the AND / OR / NOT keywords, and optionally group these with parenthesis. Must be URL-encoded."
  }

def get_everything(q:, ...)
  ...

The really tricky thing here is figuring out a developer-friendly interface.

kieranklaassen commented 2 months ago

Happy to add support for it in tool_tailor. Let me know if that would work. I think I want to support more use cases going forward as well.

dghirardo commented 2 months ago

Hi @andreibondarev, as agreed, since I have opened a new pull request with the new approach, I'm closing this one.