googleapis / python-aiplatform

A Python SDK for Vertex AI, a fully managed, end-to-end platform for data science and machine learning.
Apache License 2.0
615 stars 330 forks source link

Running vertexai model in script raises `Waiting for thread pool to idle before forking` #2832

Closed jasonw247 closed 6 months ago

jasonw247 commented 11 months ago

Environment details

Steps to reproduce

Run the example code below in a python script. I have run the script without including vertex ai models successfully, so it appears the issue is specific to getting predictions from vertexai.

Code example

Using either of these approaches:

import vertexai 
vertexai.init(project=project, location=location)

class VQAModel:
    def __init__(self, model_path: str = "imagetext@001"):
        self.model_path = model_path
        self.model = ImageQnAModel.from_pretrained(self.model_path)

    def _convert_numpy_to_image(self, image_array: np.ndarray) -> Image:
        img = PILImage.fromarray(image_array)
        buf = BytesIO()
        img.save(buf, format='JPEG')
        image_bytes = buf.getvalue()
        return Image(image_bytes)

    def ask_question(self, image: Union[np.ndarray, Image], question: str, number_of_results: int=1) -> List[str]:
        if isinstance(image, np.ndarray):
            image = self._convert_numpy_to_image(image)
        answers = self.model.ask_question(
            image=image,
            question=question,
            number_of_results=number_of_results,
        )
        return answers

vqa = VQAModel()
res = model.answer_question(image, question, 1)

or

import vertexai 
vertexai.init(project=project, location=location)

def answer_image_question(image: Union[np.ndarray, Image], question: str, number_of_results: int=1, model_path="imagetext@001") -> List[str]:
    def _convert_numpy_to_image(image_array: np.ndarray) -> Image:
        img = PILImage.fromarray(image_array)
        buf = BytesIO()
        img.save(buf, format='JPEG')
        image_bytes = buf.getvalue()
        return Image(image_bytes)

    if isinstance(image, np.ndarray):
            image = _convert_numpy_to_image(image)
    model = ImageQnAModel.from_pretrained(model_path)
    answers = model.ask_question(
        image=image,
        question=question,
        number_of_results=number_of_results,
    )
    return answers

res = answer_image_question(image, question, 1)

Causes a python script to get locked in E1020 14:47:03.033279000 8063131776 thread_pool.cc:254] Waiting for thread pool to idle before forking when running in a main script.

Ark-kun commented 10 months ago

This might be a GRPC issue: https://github.com/grpc/grpc/issues/33218 Can you list the package versions in your env? (especially, GRPC)?

Also, if you have time, can you please try the older SDK version 1.31.0?

jasonw247 commented 10 months ago

GRPC-related packages in my env:

grpc-google-iam-v1==0.12.6
grpcio==1.52.0
grpcio-status==1.48.2

The sdk version 1.31.0 makes the second approach functional (i.e. answer_image_question(...)), but still blocks for the first approach (i.e. VQAModel.ask_question(...)).

Ark-kun commented 6 months ago

The LVM SDK does not directly work with thread pool. This looks like a gRPC issue. Such issue has recently be fixed on the gRPC side, so I'm closing this bug. If you experience this issue again, please re-open.

Thank you.