ResourceExhausted: 429 received metadata size exceeds soft limit

pweglik commented 3 months ago

Environment details

OS: Dockerfile base image: python:3.11
Python version: 3.11
pip version: 24.0
google-auth version: 2.29.0

Description

We have created a simple Flask server and deployed it to GCP as Cloud Run Service. We are also using few other dependencies:

google-api-core==2.19.0
google-cloud-aiplatform==1.49.0 # we needed to pin this version, because of depraction warning somewhere in ai-vertex, might be worth checking in future
google-cloud-logging==3.10.0
langchain==0.2.1
langchain_community==0.2.1
langchain-google-vertexai==1.0.4

snippet of the code:

import google.cloud.logging
from langchain_google_vertexai import VertexAI

# setup logging
client = google.cloud.logging.Client()
client.setup_logging()

# in the endpoint
llm_model = VertexAI(
        model_name="text-bison",
        max_output_tokens=256,
        temperature=1,
        top_p=0.8,
        top_k=40,
        verbose=True,
    )

We don't do anything more sophisticated than that. After deployment, it ran fine for few hours and then we started to received warnings:

Retrying langchain_google_vertexai.llms._completion_with_retry.<locals>._completion_with_retry_inner in 4.0 seconds as it raised ResourceExhausted: 429 received metadata size exceeds soft limit (16711 vs. 16384);  :path:90B :authority:79B :method:43B :scheme:44B content-type:60B te:42B grpc-accept-encoding:75B user-agent:100B grpc-trace-bin:103B pc-low-fwd-bin:77B x-goog-request-params:148B x-goog-api-client:12052B x-goog-api-client:62B authorization:1076B x-google-gfe-frontline-info:836B x-google-gfe-timestamp-trace:76B x-google-gfe-verified-user-ip:76B x-gfe-signed-request-headers:472B x-google-gfe-location-info:74B x-gfe-ssl:44B x-google-gfe-tls-base64urlclienthelloprotobuf:299B x-user-ip:56B x-google-service:105B x-google-gfe-service-trace:115B x-google-gfe-backend-timeout-ms:71B accept-encoding:56B x-google-peer-delegation-chain-bin:92B x-google-request-uid:138B x-google-dappertraceinfo:111B.

You can see that there are two fields named x-goog-api-client and one is growing out of proportion. Later on it grows even bigger and we started to received it on almost every request. The server also started to timeout as it was unable to serve those requests.

Retrying langchain_google_vertexai.llms._completion_with_retry.<locals>._completion_with_retry_inner in 4.0 seconds as it raised ResourceExhausted: 429 received metadata size exceeds soft limit (27114 vs. 16384);  :path:90B :authority:79B :method:43B :scheme:44B content-type:60B te:42B grpc-accept-encoding:75B user-agent:100B grpc-trace-bin:103B pc-low-fwd-bin:77B x-goog-request-params:148B x-goog-api-client:22452B x-goog-api-client:62B authorization:1076B x-google-gfe-frontline-info:837B x-google-gfe-timestamp-trace:76B x-google-gfe-verified-user-ip:76B x-gfe-signed-request-headers:472B x-google-gfe-location-info:74B x-gfe-ssl:44B x-google-gfe-tls-base64urlclienthelloprotobuf:299B x-user-ip:56B x-google-service:105B x-google-gfe-service-trace:115B x-google-gfe-backend-timeout-ms:71B accept-encoding:56B x-google-peer-delegation-chain-bin:92B x-google-request-uid:140B x-google-dappertraceinfo:111B.

It looks like something is appended to this field and it overflows after some time. I found a place in the copde of the library tha could cause it: https://github.com/googleapis/google-auth-library-python/blob/main/google/auth/metrics.py#L138-L154

I'm looking for some guidance what could cause such warning and overflow in requests.

Steps to reproduce

I'm not really sure, error only occurred after few hours (serving few thousands requests)

Let me know if I can help you somehow or provide any additional info!

arithmetic1728 commented 3 months ago

@pweglik Could you provide some examples of the large x-goog-api-client header value to help us identify the issue?

arithmetic1728 commented 3 months ago

@pweglik I created a fix in metric branch, not sure if that works, could you give it a try? You can install it via pip install git+https://github.com/googleapis/google-auth-library-python.git@metric

JehangirLBG commented 3 months ago

@arithmetic1728 Hi I have the same issue, I've tried your fix and it doesn't seem to have any effect.

In my case the issue is easily reproducible after calling the llm a 1000 times in succession. As I'm trying to evaluate llm outputs on a few different metrics I need to run a call to the LLM on a large amount of outputs and rerun a couple times.

After around 1000 calls I hit the same metadata issue as above with the x-goog-api-client-header continuing to grow with each request and receiving the error after each call when the soft limit had been reached.

arithmetic1728 commented 3 months ago

@JehangirLBG any chance to give some examples of the x-goog-api-client header?

arithmetic1728 commented 3 months ago

@JehangirLBG can you also provide an easy code sample to repro?

arithmetic1728 commented 3 months ago

@JehangirLBG I updated that branch with a new potential fix. Could you try it again? Thanks!

JehangirLBG commented 3 months ago

@arithmetic1728 Hi, I have gave it a go and still having the issue, as you can see error message below: Retrying langchain_google_vertexai.chat_models._completion with_retry.<locals-_completion-with-retry-inner in 4.0 seconds as it raised ResourceExhausted: 429 received metadata size exceeds soft limit (34425 vs. 16384): path 103B authority 79B method:43 Bischeme:44B content-type:60B te:42B grpc-accept-encoding:75B user-agent:100B grpc-trace-bin: 103B pc-low-fud-bin:77B x-goog-request-params:150B x-goog-api-client:29279B x-goog-api-client:62B authorization: 1076B *-goog-user-project: 65B x-google-gfe-frontline-info:914B x-google-gfe-cloud-client-vnid:69B x-google-gfe-cloud-client-network-project-number.92B x-google-gfe-timestamp-trace:76B x-google-gfe-verified-user-ip:97B x-gfe-signed-request-headers:620B x-google-gfe-location-info:74B x-gfe-ssl:44B x-google-gfe-tls-base64urlclienthelloprotobuf:299B x-user-ip:52B x-google-service: 105B x-google-gfe-service-trace: 115B x-google-gfe-backend-timeout-ms:71B accept-encoding:56B x-google-peer-delegation-chain-bin:92B x-google request-uid:141B x-google-dappertraceinfo: 111B.

I have seen the metadata go up to 50000 with only x-goog-api-client increasing in size.

arithmetic1728 commented 3 months ago

@JehangirLBG I see. The error message only tells the size (e.g. x-goog-api-client:29279B), what I need is the x-goog-api-client header string content (i.e. the 29279 bytes). Any x-goog-api-client values are fine.

The header can be added by multiple libraries beside this lib, without knowing the content, it's not possible to figure out the reason. Is there any way to reveal the header string in the error message?

I updated the "fix" to not add the x-goog-api-client header from this lib (This is not what we wanted to do, but just for troubling shooting). Could you try it again?

Also could you provide a simple repro sample?

JehangirLBG commented 3 months ago

@arithmetic1728 Tried to run again, still getting the issue.

I understand what you mean with the needing the content of the actual header now, I believe I can just run '.response_metadata' on the returned llm object, however I'm still new to Python so unsure how to break from the code once the metadata has exceeded a certain amount and then run this on the last object returned.

If you have any guidance on that I can do it now or I can check with my colleague tomorrow and get back to you. Thanks for your help :)

I can try to get something together to help reproduce this on your end.

arithmetic1728 commented 3 months ago

@JehangirLBG You don't need to run this on the last object, just run it for every object, and provide a few x-goog-api-client header values that grow in size. From there I can compare and see which part is repeated, and figure out what library adds it. Thank you for your help!

arithmetic1728 commented 2 months ago

Closing the issue due to inactivity, please reopen if the issue persists or if any new information can be provided, thanks!

googleapis / google-auth-library-python