langchain-ai / langchain-google

MIT License
116 stars 145 forks source link

Input/output token count returned by Langchain-Google seem excessively high #491

Closed boriswang01 closed 1 month ago

boriswang01 commented 2 months ago

Below is a output message returned by Gemini via Langchain:

""" 'Here\'s a reformatted version of the previous response, focusing on clarity and readability:\n\n## AI Model Performance & Financial Data\n\nLet\'s break down your questions one by one:\n\n1. MATH Score of Llama 400B\n\nLooking at the image, the MATH score for Llama-400b (early snapshot) is 57.8% using a 4-shot Chain of Thought (CoT) approach.\n\n2. Table 14.2: Betas for Financial Service Businesses\n\nWhile the exact table you requested wasn\'t found, the "finfirm09.pdf" document contains relevant data on betas for financial service businesses. This information is crucial for understanding the relationship between a company\'s stock price and the overall market.\n\nHere\'s the table from the document:\n\n| Category | US | Europe | Emerging Markets |\n|---------------------------|------|--------|-------------------|\n| Large Money Center Banks | 0.71 | 0.80 | 0.9 |\n| Small/Regional Banks | 0.91 | 0.98 | 1.05 |\n| Thrifts | 0.66 | 0.75 | 0.85 |\n| Brokerage Houses | 1.37 | 1.25 | 1.5 |\n| Investment Banks | 1.50 | 1.55 | 1.9 |\n| Life Insurance | 1.17 | 1.20 | 1.1 |\n| Property and Casualty Insurance Companies | 0.91 | 0.95 | 0.9 |\n\n3. Table 3: Default Spreads by Sovereign Ratings Class – September 2008\n\nThe "riskfreerate.pdf" document contains the requested Table 3, which details default spreads based on sovereign ratings. This information is essential for assessing credit risk and determining appropriate interest rates.\n\nHere\'s the table:\n\n| Sovereign Rating | Bonds/ CDS | Corporate Bonds |\n|------------------|------------|-----------------|\n| Aaa | 0.15% | 0.50% |\n| Aa1 | 0.30% | 0.80% |\n| Aa2 | 0.60% | 1.10% |\n| Aa3 | 0.80% | 1.20% |\n| A1 | 1.00% | 1.35% |\n| A2 | 1.30% | 1.45% |\n| A3 | 1.40% | 1.50% |\n| Baa1 | 1.70% | 1.70% |\n| Baa2 | 2.00% | 2.00% |\n| Baa3 | 2.25% | 2.60% |\n| Ba1 | 2.50% | 3.20% |\n| Ba2 | 3.00% | 3.50% |\n| Ba3 | 3.25% | 4.00% |\n| B1 | 3.50% | 4.50% |\n| B2 | 4.25% | 5.50% |\n| B3 | 5.00% | 6.50% |\n| Caa1 | 6.00% | 7.00% |\n| Caa2 | 6.75% | 9.00% |\n| Caa3 | 7.50% | 11.00% |\n\nThis table highlights the relationship between credit ratings and default spreads as of September 2008. As you can see, higher credit ratings generally correlate with lower default spreads, reflecting lower perceived risk. \n' response_metadata={'finish_reason': 'STOPSTOPSTOPSTOPSTOPSTOPSTOPSTOPSTOPSTOPSTOPSTOPSTOPSTOPSTOPSTOPSTOPSTOPSTOP', 'safety_ratings': [{'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}]} id='run-f9bce28d-e44b-4445-af09-605412426ff2' usage_metadata={'input_tokens': 181773, 'output_tokens': 7814, 'total_tokens': 189587} """

At the end, usage_metadata returned a output-token of 7814, this seems way too high for a relatively small output character amount (3004 character, 310 words), and likewise with the input tokens. It seems that the input and output token counter are all inflated by a factor of 10x, is this a bug?

Also do the safety_ratings output count towards input/output token count and cost?

boriswang01 commented 2 months ago

@lkuligin Hey is it okay to get this bug investigated soon? We want to start using Gemini ASAP but this problem is preventing us from doing so because it's messing up our token counters.

efriis commented 2 months ago

could you add some code that reproduces this? Things that would be helpful:

boriswang01 commented 1 month ago

could you add some code that reproduces this? Things that would be helpful:

  • which package are you on (langchain-google-genai or langchain-google-vertexai)
  • which version of that package are you on

Hey, just tested this issue again for Gemini-1.5-Pro-002.

Problem update:

It would be hard for me to share the prompt that reproduce this due to confidential material, but please check on your end that the input token count is correct.

The newest Gemini production model is really good at language translations, looking forward to integrate ASAP.

efriis commented 1 month ago

hey you don't have to output your exact prompt, but e.g. this repro on latest seems to be working. would be helpful for figuring out which features you might be using that has a bug!

without a reproducible example, I'll close it as "unable to reproduce" next week

note that tools add to input tokens as well, if that might be the source of confusion? https://ai.google.dev/gemini-api/docs/tokens?lang=python#system-instructions-and-tools

ScreenShot 2024-09-27 at 11 13 16AM ScreenShot 2024-09-27 at 11 17 26AM

boriswang01 commented 1 month ago

hey you don't have to output your exact prompt, but e.g. this repro on latest seems to be working. would be helpful for figuring out which features you might be using that has a bug!

without a reproducible example, I'll close it as "unable to reproduce" next week

note that tools add to input tokens as well, if that might be the source of confusion? https://ai.google.dev/gemini-api/docs/tokens?lang=python#system-instructions-and-tools

ScreenShot 2024-09-27 at 11 13 16AM ScreenShot 2024-09-27 at 11 17 26AM

Hey I will DM you a sample input/output (due to company proprietary content). Please feel free to close this ticket if needed.

lkuligin commented 1 month ago

+1, we need a reproducible example (ideally, try to reproduce an issue with non-confidential prompt). Please, check that you're not submitting any multimodal input with your prompt since it counts towards input tokens too.

efriis commented 1 month ago

wrote a standard test for this in https://github.com/langchain-ai/langchain/pull/27177