googleapis / python-aiplatform

A Python SDK for Vertex AI, a fully managed, end-to-end platform for data science and machine learning.
Apache License 2.0
628 stars 341 forks source link

Some requests to gemini models are being stuck #4509

Open romantsovmike opened 6 days ago

romantsovmike commented 6 days ago

Environment details

Code example

Hi, we are using gemini models (pro, flash) on VertexAI platform and some of the async requests are being stuck forever. The code of calling the model is the following:

import vertexai
from vertexai.preview.generative_models import GenerativeModel

vertexai.init(project=..., location=...)
vision_model = GenerativeModel(model_name)

...

result = await vision_model.generate_content_async(
    contents=content_to_model,
    safety_settings=safety_config,
    generation_config=generation_config,
    stream=False,
)

There is no possibility to set the request timeout for this call, so we created our own one with the following code:

result = await asyncio.wait_for(
    vision_model.generate_content_async(
        contents=content_to_model,
        safety_settings=safety_config,
        generation_config=generation_config,
        stream=False,
    ),
    timeout=65
)

Some of the long-running calls are being caught by this timeout and we were able to retry the method, but some of them are still stuck forever for some reason.

Looks like there is some kind of thread locking inside of the async method from library. Something like the following code:


async def sleep_sync(timeout):
    time.sleep(timeout)
    return timeout

async def sleep_async(timeout):
    await asyncio.sleep(timeout)
    return timeout

# No locking, when timeout is reached, we receive exception
await asyncio.wait_for(
    sleep_async(10),
    timeout=4
)

# This code is being locked by synchronous time.sleep method
await asyncio.wait_for(
    sleep_sync(10),
    timeout=4
)

Stack trace

No stack trace available because the code stuck
romantsovmike commented 3 days ago

By the way, it would be great if users will be able to pass timeout argument to the generate_content_async method.