palladius / gemini-news-crawler

A GenAI news crawler in Ruby leveraging Gemini multimodality ability
MIT License
13 stars 1 forks source link

Net::HTTPBadRequest error in demo04 #2

Closed palladius closed 2 months ago

palladius commented 2 months ago

Last night I did the demo04 and for some strange reason it STOPPED working. It was working 2 nights ago. I wonder if one of the many changes I did in the past 48h changed it.

I can now reproduce the same bug on derek.


$ rails c

llm = Rails.application.credentials.env.GEMINI_API_KEY_BIG_QUOTA) # rescue nil # 9xhQ
llm.defaults[:chat_completion_model_name] # Which model are we using?
# => "gemini-1.5-pro-latest"

@assistant =
  instructions: 'You are a News Assistant.',
  # You can iterate and program your assistant based on your preferences.
  # instructions: "You are a News Assistant. When prompted for further info about some news, dont call further functions; instead show the JSON of the matching article - if there's one.",
  tools: [
    NewsRetriever, # 🔧 instantiated in config/initializers/ # 🔧 instantiating now. Code in:
# =>  #<Langchain::Assistant:0x00007f8de1a493d8

webapp(dev)> @assistant.say 'Latest 5 news from Italy'
I, [2024-08-22T14:04:50.659340 #4182092]  INFO -- : [Langchain.rb] [Langchain::Assistant]: Sending a call to Langchain::LLM::GoogleGemini
config/initializers/riccardo15_monkeypatch_langchain_assistant.rb:12:in `say': #<Net::HTTPBadRequest:0x00007f8de1dd08d0> (StandardError)
        from (webapp):79:in `<main>'
palladius commented 2 months ago

I dont know why, but getting my code back to 7d ago fixes it #dammit

# code in palladius/sakura
$ git-revert-main-to-N-days-ago 7

irb(main):026> s 'Latest 5 news from Italy'
I, [2024-08-22T14:11:11.560579 #6797]  INFO -- : [Langchain.rb] [Langchain::Assistant]: Sending a call to Langchain::LLM::GoogleGemini
I, [2024-08-22T14:11:13.515898 #6797]  INFO -- : [Langchain.rb] [Langchain::Tool::NewsRetriever]: Retrieving top news headlines
I, [2024-08-22T14:11:13.738714 #6797]  INFO -- : [Langchain.rb] [Langchain::Assistant]: Sending a call to Langchain::LLM::GoogleGemini
🔢➡️🔢 [function] 🛠️  news_retriever__get_top_headlines => {"status":"ok","totalResults":34,"articles":[{"source":{"id":"google-news","name":"Google News"},"author":"ForlìToday","title":"Vaiolo delle scimmie, l'Oms dichiara l'emergenza globale: \"Aumenta la contagiosità, ma non è un nuovo covid\" - ForlìToday","description":null,"url":"","urlToImage":null,"publishedAt":"2024-08-21T10:38:05Z","content":null},{"source":{"id":"google-news","name":"Google News"},"author":"la Repubblica","title":"Guerra Ucraina - Russia, le news di oggi - la Repubblica","description":null,"url":"","urlToImage":null,"publishedAt":"2024-08-21T09:56:39Z","content":null},{"source":{"id":"google-news","name":"Google News"},"author":"Il Sole 24 ORE","title":"Iran, si schianta bus con pellegrini: almeno 28 morti - Il Sole 24 ORE","description":null,"url":"","urlToImage":null,"publishedAt":"2024-08-21T09:45:00Z","content":null},{"source":{"id":"google-news","name":"Google News"},"author":"Corriere della Sera","title":"Caso Arianna Meloni, Di Pietro: «È nel mirino perché vogliono arrivare a Giorgia» - Corriere della Sera","description":null,"url":"","urlToImage":null,"publishedAt":"2024-08-21T09:12:19Z","content":null},{"source":{"id":"google-news","name":"Google News"},"author":"Agenzia ANSA","title":"Bufera Sinner, positivo al doping 'ma è sca.. (🤥 redacted)
🤖 [model] 💬 Here are the latest 5 news from Italy:

* Vaiolo delle scimmie, l'Oms dichiara l'emergenza globale: "Aumenta la contagiosità, ma non è un nuovo covid" - ForlìToday
* Guerra Ucraina - Russia, le news di oggi - la Repubblica
* Iran, si schianta bus con pellegrini: almeno 28 morti - Il Sole 24 ORE
* Caso Arianna Meloni, Di Pietro: «È nel mirino perché vogliono arrivare a Giorgia» - Corriere della Sera
* Bufera Sinner, positivo al doping 'ma è scagionato' - Agenzia ANSA
=> nil
palladius commented 2 months ago

Enabled debugging of Net::HTTP thanks to (note it only works once!)

SL established, protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384
<- "POST /v1beta/models/gemini-1.5-pro-latest:generateContent?key=[REDACTED] HTTP/1.1\r\nAccept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3\r\nAccept: */*\r\nUser-Agent: Ruby\r\nHost:\r\nContent-Type: application/json\r\nContent-Length: 6411\r\n\r\n"
<- "{\"tools\":{\"functionDeclarations\":[{\"name\":\"news_retriever__get_everything\",\"description\":\"News Retriever: Search through millions of articles from over 150,000 large and small news sources and blogs.\",\"parameters\":{\"type\":\"object\",\"properties\":{\"q\":{\"type\":\"string\",\"description\":\"Keywords or phrases to search for in the article title and body. Surround phrases with quotes (\\\") for exact match. Alternatively you can use the AND / OR / NOT keywords, and optionally group these with parenthesis. Must be URL-encoded.\"},\"searchIn\":{\"type\":\"string\",\"description\":\"The fields to restrict your q search to.\",\"enum\":[\"title\",\"description\",\"content\"]},\"sources\":{\"type\":\"string\",\"description\":\"A comma-seperated string of identifiers (maximum 20) for the news sources or blogs you want headlines from. Use the /sources endpoint to locate these programmatically or look at the sources index.\"},\"domains\":{\"type\":\"string\",\"description\":\"A comma-seperated string of domains (eg,, to restrict the search to.\"},\"excludeDomains\":{\"type\":\"string\",\"description\":\"A comma-seperated string of domains (eg,, to remove from the results.\"},\"from\":{\"type\":\"string\",\"description\":\"A date and optional time for the oldest article allowed. This should be in ISO 8601 format.\"},\"to\":{\"type\":\"string\",\"description\":\"A date and optional time for the newest article allowed. This should be in ISO 8601 format.\"},\"language\":{\"type\":\"string\",\"description\":\"The 2-letter ISO-639-1 code of the language you want to get headlines for.\",\"enum\":[\"ar\",\"de\",\"en\",\"es\",\"fr\",\"he\",\"it\",\"nl\",\"no\",\"pt\",\"ru\",\"sv\",\"ud\",\"zh\"]},\"sortBy\":{\"type\":\"string\",\"description\":\"The order to sort the articles in.\",\"enum\":[\"relevancy\",\"popularity\",\"publishedAt\"]},\"pageSize\":{\"type\":\"integer\",\"description\":\"The number of results to return per page (request). 5 is the default, 100 is the maximum.\"},\"page\":{\"type\":\"integer\",\"description\":\"Use this to page through the results if the total results found is greater than the page size.\"}}}},{\"name\":\"news_retriever__get_top_headlines\",\"description\":\"News Retriever: Provides live top and breaking headlines for a country, specific category in a country, single source, or multiple sources. You can also search with keywords. Articles are sorted by the earliest date published first.\",\"parameters\":{\"type\":\"object\",\"properties\":{\"country\":{\"type\":\"string\",\"description\":\"The 2-letter ISO 3166-1 code of the country you want to get headlines for.\",\"enum\":[\"ae\",\"ar\",\"at\",\"au\",\"be\",\"bg\",\"br\",\"ca\",\"ch\",\"cn\",\"co\",\"cu\",\"cz\",\"de\",\"eg\",\"fr\",\"gb\",\"gr\",\"hk\",\"hu\",\"id\",\"ie\",\"il\",\"in\",\"it\",\"jp\",\"kr\",\"lt\",\"lv\",\"ma\",\"mx\",\"my\",\"ng\",\"nl\",\"no\",\"nz\",\"ph\",\"pl\",\"pt\",\"ro\",\"rs\",\"ru\",\"sa\",\"se\",\"sg\",\"si\",\"sk\",\"th\",\"tr\",\"tw\",\"ua\",\"us\",\"ve\",\"za\"]},\"category\":{\"type\":\"string\",\"description\":\"The category you want to get headlines for.\",\"enum\":[\"business\",\"entertainment\",\"general\",\"health\",\"science\",\"sports\",\"technology\"]},\"q\":{\"type\":\"string\",\"description\":\"Keywords or a phrase to search for.\"},\"pageSize\":{\"type\":\"integer\",\"description\":\"The number of results to return per page (request). 5 is the default, 100 is the maximum.\"},\"page\":{\"type\":\"integer\",\"description\":\"Use this to page through the results if the total results found is greater than the page size.\"}}}},{\"name\":\"news_retriever__get_sources\",\"description\":\"News Retriever: This endpoint returns the subset of news publishers that top headlines (/v2/top-headlines) are available from. It's mainly a convenience endpoint that you can use to keep track of the publishers available on the API, and you can pipe it straight through to your users.\",\"parameters\":{\"type\":\"object\",\"properties\":{\"country\":{\"type\":\"string\",\"description\":\"The 2-letter ISO 3166-1 code of the country you want to get headlines for. Default: all countries.\",\"enum\":[\"ae\",\"ar\",\"at\",\"au\",\"be\",\"bg\",\"br\",\"ca\",\"ch\",\"cn\",\"co\",\"cu\",\"cz\",\"de\",\"eg\",\"fr\",\"gb\",\"gr\",\"hk\",\"hu\",\"id\",\"ie\",\"il\",\"in\",\"it\",\"jp\",\"kr\",\"lt\",\"lv\",\"ma\",\"mx\",\"my\",\"ng\",\"nl\",\"no\",\"nz\",\"ph\",\"pl\",\"pt\",\"ro\",\"rs\",\"ru\",\"sa\",\"se\",\"sg\",\"si\",\"sk\",\"th\",\"tr\",\"tw\",\"ua\",\"us\",\"ve\",\"za\"]},\"category\":{\"type\":\"string\",\"description\":\"The category you want to get headlines for. Default: all categories.\",\"enum\":[\"business\",\"entertainment\",\"general\",\"health\",\"science\",\"sports\",\"technology\"]},\"language\":{\"type\":\"string\",\"description\":\"The 2-letter ISO-639-1 code of the language you want to get headlines for.\",\"enum\":[\"ar\",\"de\",\"en\",\"es\",\"fr\",\"he\",\"it\",\"nl\",\"no\",\"pt\",\"ru\",\"sv\",\"ud\",\"zh\"]}}}},{\"name\":\"article_tool__create\",\"description\":\"Article Database: Create a new article\",\"parameters\":{\"type\":\"object\",\"properties\":{\"title\":{\"type\":\"string\",\"description\":\"Article title\"},\"summary\":{\"type\":\"string\",\"description\":\"Article summary (in UTF-8)\"},\"content\":{\"type\":\"string\",\"description\":\"Article content (in UTF-8)\"},\"author\":{\"type\":\"string\",\"description\":\"Article author\"},\"link\":{\"type\":\"string\",\"description\":\"Article link\"},\"publishedDate\":{\"type\":\"string\",\"description\":\"Article published date\"},\"language\":{\"type\":\"string\",\"description\":\"Article language (2 letters)\"},\"country\":{\"type\":\"string\",\"description\":\"Country the article refers to (whatever makes more sense to you: the newspaper location, the country where they speak the language, or the country where the facts happen. If unsure, say Vatican City)\"},\"countryEmoji\":{\"type\":\"string\",\"description\":\"Country flag emoji - emoji of the country you chose in 'country' field. If unsure, emoji of vatican city.\"}},\"required\":[\"title\",\"summary\",\"content\",\"author\",\"link\",\"published_date\",\"language\"]}},{\"name\":\"article_tool__delete\",\"description\":\"Article Database: Delete an article by id\",\"parameters\":{\"type\":\"object\",\"properties\":{\"id\":{\"type\":\"number\",\"description\":\"Article numeric ID\"}},\"required\":[\"id\"]}},{\"name\":\"article_tool__carlessian_url\",\"description\":\"Carlessian URL: Provide an article perma-URL for gemini-news-crawler app by id\",\"parameters\":{\"type\":\"object\",\"properties\":{\"id\":{\"type\":\"number\",\"description\":\"Article numeric ID\"}},\"required\":[\"id\"]}}]},\"model\":\"gemini-1.5-pro-latest\",\"contents\":[{\"role\":\"user\",\"parts\":[{\"text\":\"Latest 5 news from Italy\"}]},{\"role\":\"user\",\"parts\":[{\"text\":\"Latest 5 news from Italy\"}]}],\"systemInstruction\":{\"parts\":[{\"text\":\"You are a News Assistant.\"}]},\"toolConfig\":{\"functionCallingConfig\":{\"mode\":\"AUTO\"}},\"generationConfig\":{\"temperature\":0.0}}"
palladius commented 2 months ago

to help troubleshooting:

post_str = post1 + post2
h = JSON.parse(post2)


       "News Retriever: Search through millions of articles from over 150,000 large and small news sources and blogs.",
             "Keywords or phrases to search for in the article title and body. Surround phrases with quotes (\") for exact match. Alternatively you can use the AND / OR / NOT keywords, and optionally group these with parenthesis. Must be URL-encoded."},
            "description"=>"The fields to restrict your q search to.",
            "enum"=>["title", "description", "content"]},
             "A comma-seperated string of identifiers (maximum 20) for the news sources or blogs you want headlines from. Use the /sources endpoint to locate these programmatically or look at the sources index."},
             "A comma-seperated string of domains (eg,, to restrict the search to."},
             "A comma-seperated string of domains (eg,, to remove from the results."},
             "A date and optional time for the oldest article allowed. This should be in ISO 8601 format."},
             "A date and optional time for the newest article allowed. This should be in ISO 8601 format."},
            "description"=>"The 2-letter ISO-639-1 code of the language you want to get headlines for.",
            "enum"=>["ar", "de", "en", "es", "fr", "he", "it", "nl", "no", "pt", "ru", "sv", "ud", "zh"]},
            "description"=>"The order to sort the articles in.",
            "enum"=>["relevancy", "popularity", "publishedAt"]},
             "The number of results to return per page (request). 5 is the default, 100 is the maximum."},
             "Use this to page through the results if the total results found is greater than the page size."}}}},
       "News Retriever: Provides live top and breaking headlines for a country, specific category in a country, single source, or multiple sources. You can also search with keywords. Articles are sorted by the earliest date published first.",
            "description"=>"The 2-letter ISO 3166-1 code of the country you want to get headlines for.",
            "description"=>"The category you want to get headlines for.",
            "enum"=>["business", "entertainment", "general", "health", "science", "sports", "technology"]},
          "q"=>{"type"=>"string", "description"=>"Keywords or a phrase to search for."},
             "The number of results to return per page (request). 5 is the default, 100 is the maximum."},
             "Use this to page through the results if the total results found is greater than the page size."}}}},
       "News Retriever: This endpoint returns the subset of news publishers that top headlines (/v2/top-headlines) are available from. It's mainly a convenience endpoint that you can use to keep track of the publishers available on the API, and you can pipe it straight through to your users.",
             "The 2-letter ISO 3166-1 code of the country you want to get headlines for. Default: all countries.",
            "description"=>"The category you want to get headlines for. Default: all categories.",
            "enum"=>["business", "entertainment", "general", "health", "science", "sports", "technology"]},
            "description"=>"The 2-letter ISO-639-1 code of the language you want to get headlines for.",
            "enum"=>["ar", "de", "en", "es", "fr", "he", "it", "nl", "no", "pt", "ru", "sv", "ud", "zh"]}}}},
      "description"=>"Article Database: Create a new article",
         {"title"=>{"type"=>"string", "description"=>"Article title"},
          "summary"=>{"type"=>"string", "description"=>"Article summary (in UTF-8)"},
          "content"=>{"type"=>"string", "description"=>"Article content (in UTF-8)"},
          "author"=>{"type"=>"string", "description"=>"Article author"},
          "link"=>{"type"=>"string", "description"=>"Article link"},
          "publishedDate"=>{"type"=>"string", "description"=>"Article published date"},
          "language"=>{"type"=>"string", "description"=>"Article language (2 letters)"},
             "Country the article refers to (whatever makes more sense to you: the newspaper location, the country where they speak the language, or the country where the facts happen. If unsure, say Vatican City)"},
             "Country flag emoji - emoji of the country you chose in 'country' field. If unsure, emoji of vatican city."}},
        "required"=>["title", "summary", "content", "author", "link", "published_date", "language"]}},
      "description"=>"Article Database: Delete an article by id",
        "properties"=>{"id"=>{"type"=>"number", "description"=>"Article numeric ID"}},
      "description"=>"Carlessian URL: Provide an article perma-URL for gemini-news-crawler app by id",
        "properties"=>{"id"=>{"type"=>"number", "description"=>"Article numeric ID"}},
  [{"role"=>"user", "parts"=>[{"text"=>"Latest 5 news from Italy"}]},
   {"role"=>"user", "parts"=>[{"text"=>"Latest 5 news from Italy"}]}],
 "systemInstruction"=>{"parts"=>[{"text"=>"You are a News Assistant."}]},
palladius commented 2 months ago
  1. created a script to check for Gemini error
  2. Used git bisect to find the bad push.

=> first time I identified this bad push: [c5984a691cff707cca7c4ec713627a43e71c8c87] 0.3.64 removed CarlkessianChat

but then i found in the bug in my shell bisect script and i identified another:

commit 0e007e42226bede7c0143f04cad96d1a45c801b5 Author: Riccardo Carlesso Date: Thu Aug 22 08:35:34 2024 +0200

0.3.64 update ActiveX from to

this was an easy fix so I reverted rails from 7132 > 7134 -> 7.2.0 back to 7132. but didnt seem to fix.

palladius commented 2 months ago

Oh wow. I tried to run rails c and error was there, but i tried with RAILS_ENV=production and it failed. asked for a tiktoken_ruby gem => I rgrepped locally and foung that the Gemfile.lock.good was the ONLY file to have it. This file was created as part of the small migfration from commit commit

So I looked and that is a requirement from Andrei's gem '0.13.1. That makes sense as I did a lot of andrei's monkeypatching so it makes sense i need EXACTLY the version from Verona (May24 - 0.13.1).

But wait, didnt I pin it up to May24 already with

gem 'langchainrb', '~> 0.13.1'

? No! Turns out the '~>' says at least, so was a version above. So i fixed it to 0.13.1 and now Gemini works again!

palladius commented 2 months ago

And in fact:

-    langchainrb (0.13.5)
+    langchainrb (0.13.1)

Fixed in latest version 0.3.69 wow!

andreibondarev commented 2 months ago

@palladius If you apply this change to the gem, does it fix your tool?