patterns-ai-core / langchainrb

Build LLM-powered applications in Ruby
MIT License
1.3k stars 181 forks source link

Improve LLM model tool annotation #659

Closed dghirardo closed 2 months ago

dghirardo commented 3 months ago

Issue #637

This pull request implements the tool_tailor gem ( to automate the generation of tool annotations.


andreibondarev commented 3 months ago


irb(main):003> ENV["NEWS_API_KEY"]).to_openai_tools
    "description"=>"News Retriever: Search through millions of articles from over 150,000 large and small news sources and blogs.",
           "Keywords or phrases to search for in the article title and body. Surround phrases with quotes (\") for exact match. Alternatively you can use the AND / OR / NOT keywords, and optionally group these with parenthesis. Must be URL-encoded."},
        "search_in"=>{"type"=>"string", "description"=>"The fields to restrict your q search to.", "enum"=>["title", "description", "content"]},
          "description"=>"A comma-seperated string of identifiers (maximum 20) for the news sources or blogs you want headlines from. Use the /sources endpoint to locate these programmatically or look at the sources index."},
        "domains"=>{"type"=>"string", "description"=>"A comma-seperated string of domains (eg,, to restrict the search to."},
        "exclude_domains"=>{"type"=>"string", "description"=>"A comma-seperated string of domains (eg,, to remove from the results."},
        "from"=>{"type"=>"string", "description"=>"A date and optional time for the oldest article allowed. This should be in ISO 8601 format."},
        "to"=>{"type"=>"string", "description"=>"A date and optional time for the newest article allowed. This should be in ISO 8601 format."},
        "language"=>{"type"=>"string", "description"=>"The 2-letter ISO-639-1 code of the language you want to get headlines for.", "enum"=>["ar", "de", "en", "es", "fr", "he", "it", "nl", "no", "pt", "ru", "sv", "ud", "zh"]},
        "sort_by"=>{"type"=>"string", "description"=>"The order to sort the articles in.", "enum"=>["relevancy", "popularity", "publishedAt"]},
        "page_size"=>{"type"=>"integer", "description"=>"The number of results to return per page (request). 5 is the default, 100 is the maximum."},
        "page"=>{"type"=>"integer", "description"=>"Use this to page through the results if the total results found is greater than the page size."}}}}},


irb(main):013> ENV["NEWS_API_KEY"]).to_openai_tools
    "description"=>"Retrieve all news",
       {"q"=>{"type"=>"string", "description"=>"Keywords or phrases to search for in the article title and body."},
        "search_in"=>{"type"=>"string", "description"=>"The fields to restrict your q search to. The possible options are: title, description, content."},
          "description"=>"A comma-seperated string of identifiers (maximum 20) for the news sources or blogs you want headlines from. Use the /sources endpoint to locate these programmatically or look at the sources index."},
        "domains"=>{"type"=>"string", "description"=>"A comma-seperated string of domains (eg,, to restrict the search to."},
        "exclude_domains"=>{"type"=>"string", "description"=>"A comma-seperated string of domains (eg,, to remove from the results."},
        "from"=>{"type"=>"string", "description"=>"A date and optional time for the oldest article allowed. This should be in ISO 8601 format."},
        "to"=>{"type"=>"string", "description"=>"A date and optional time for the newest article allowed. This should be in ISO 8601 format."},
        "language"=>{"type"=>"string", "description"=>"The 2-letter ISO-639-1 code of the language you want to get headlines for. Possible options: ar, de, en, es, fr, he, it, nl, no, pt, ru, se, ud, zh."},
        "sort_by"=>{"type"=>"string", "description"=>"The order to sort the articles in. Possible options: relevancy, popularity, publishedAt."},
        "page_size"=>{"type"=>"integer", "description"=>"The number of results to return per page. 20 is the API's default, 100 is the maximum. Our default is 5."},
        "page"=>{"type"=>"integer", "description"=>"Use this to page through the results."}},

Upon a quick glance -- the enum: param is missing. I wonder if we can annotate enum options in YARD docs 🤔

andreibondarev commented 3 months ago

@dghirardo Btw -- @palladius told me that you guys met at the RubyDay! 🫶

dghirardo commented 3 months ago

@andreibondarev I am currently working on a pull request for the tool_tailor gem to add support for the enum parameter. What do you think about the proposed solution?

dghirardo commented 3 months ago

Hi @andreibondarev, tool_tailor has merged my pull request, so now the enum property is supported, updating the gem. However, I noticed another issue. When you have a parameter that is a collection (like an array or a hash), you can't describe its internal structure.

Example (Tavily tool):

# @param exclude_domains [Array<String>] A list of domains to specifically exclude from the search results. Default is None, which doesn't exclude any domains.

While [Array] is supported, [Array<String>] is not. So, it’s not possible to:

You can only use the description field.

Even if we find a solution, I think nested parameters with multiple levels are not easy to describe with YARD comments.

andreibondarev commented 3 months ago

Hi @andreibondarev, tool_tailor has merged my pull request, so now the enum property is supported, updating the gem. However, I noticed another issue. When you have a parameter that is a collection (like an array or a hash), you can't describe its internal structure.

Example (Tavily tool):

# @param exclude_domains [Array<String>] A list of domains to specifically exclude from the search results. Default is None, which doesn't exclude any domains.

While [Array] is supported, [Array<String>] is not. So, it’s not possible to:

  • Define that the array items must be Strings.
  • Describe extra properties for the String values, such as enum.

You can only use the description field.

Even if we find a solution, I think nested parameters with multiple levels are not easy to describe with YARD comments.

Hmm... I think we may want to consider some sort of a decorator pattern, a la:

llm_callable :get_everything,
  description: "News Retriever: Search through millions of articles from over 150,000 large and small news sources and blogs",
  properties: {
    q: {
      type: :string,
      description: "Keywords or phrases to search for in the article title and body. Surround phrases with quotes (\") for exact match. Alternatively you can use the AND / OR / NOT keywords, and optionally group these with parenthesis. Must be URL-encoded."

def get_everything(q:, ...)

The really tricky thing here is figuring out a developer-friendly interface.

kieranklaassen commented 2 months ago

Happy to add support for it in tool_tailor. Let me know if that would work. I think I want to support more use cases going forward as well.

dghirardo commented 2 months ago

Hi @andreibondarev, as agreed, since I have opened a new pull request with the new approach, I'm closing this one.