run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.23k stars 4.95k forks source link

[Bug]: MultiModalVectorStoreIndex image_to_image_retireve on Google Cloud Platform #14887

Closed Michael-Santoro closed 1 month ago

Michael-Santoro commented 1 month ago

Bug Description

When passing an image file path to the function image_to_image_retrieve with the VertexMultiModalEmbedding initialized and ran like the following.

# configure embedding model
embed_model = VertexMultiModalEmbedding(
    model_name="multimodalembedding@001",
    project=PROJECT_ID,
    location=REGION,
)

index = MultiModalVectorStoreIndex.from_vector_store(vector_store=vector_store, embed_model=embed_model, image_embed_model=embed_model)

retriever_engine = index.as_retriever(
    similarity_top_k=3, image_similarity_top_k=3
)

retriever_engine.image_to_image_retrieve('test.png')

There is an issue with the parsing of the image file seen in the extract traceback below.

[/usr/local/lib/python3.10/dist-packages/llama_index/embeddings/vertex/base.py](https://localhost:8080/#) in _get_image_embedding(self, img_file_path)
    210         else:
    211             image = Image(image_bytes=img_file_path.getvalue())
--> 212         embeddings = self._model.get_embeddings(
    213             image=image, dimension=self.embed_dimension, **self.additional_kwargs
    214         )

[/usr/local/lib/python3.10/dist-packages/vertexai/vision_models/_vision_models.py](https://localhost:8080/#) in get_embeddings(self, image, video, contextual_text, dimension, video_segment_config)
   1199             parameters["dimension"] = dimension
   1200 
-> 1201         response = self._endpoint.predict(
   1202             instances=[instance],
   1203             parameters=parameters,

Since I am getting the following error:

InvalidArgument: 400 Invalid input image string with the following error message: Provided image is not valid.

For image parsing google recomends base64 encoding.

Version

0.10.56

Steps to Reproduce

  1. Create a Vertex AI Vector Store
  2. Populate Vector Store
  3. Initialize Vector Store and Multimodal Retrieval
  4. Run image_to_image_retrieve

Relevant Logs/Tracbacks

No response

dosubot[bot] commented 1 month ago

Hey @Michael-Santoro! I'm here to help you with any bugs, questions, or contributions you have for the repository. Let's tackle this issue together while we wait for a human maintainer.

The issue you're encountering with the image_to_image_retrieve method in the MultiModalVectorStoreIndex class when using the VertexMultiModalEmbedding model on Google Cloud Platform is due to the image input not being in the expected base64 encoded format.

To resolve this, you need to convert the image file to a base64 encoded string before passing it to the image_to_image_retrieve method. You can use the encode_image utility function provided in the LlamaIndex codebase to achieve this.

Here's how you can modify your code:

from llama_index.core.multi_modal_llms.generic_utils import encode_image

# Convert image file to base64 encoded string
image_base64 = encode_image('test.png')

# Run image_to_image_retrieve with base64 encoded image string
retriever_engine.image_to_image_retrieve(image_base64)

The encode_image function reads the image file and converts it to a base64 encoded string, which is the expected input format for the VertexMultiModalEmbedding model [1].

This should resolve the InvalidArgument: 400 Invalid input image string error you are encountering.

To continue talking to Dosu, mention @dosu.

Michael-Santoro commented 1 month ago

The proposed solution did result in an in an additional problem here is the error:

OSError: [Errno 36] File name too long: 'UklGRs4QA....'

Which makes sense since the function image_to_image_retrieve is expecting a file_path and parsing the base-64 string as such.

Michael-Santoro commented 1 month ago

A couple of resolution notes this occurred when I was using a .png. A resolution was reached by reducing the size of the image, although it was not clear in error. I used the following code to reduce the image:

from PIL import Image

def resize_image(input_path, output_path, width, height):
    """
    Resizes an image to the specified dimensions and saves it.

    Args:
        input_path (str): Path to the input image.
        output_path (str): Path to save the resized image.
        width (int): Desired width of the resized image.
        height (int): Desired height of the resized image.
    """

    try:
        # Open the image
        img = Image.open(input_path)

        # Resize the image
        resized_img = img.resize((width, height))

        # Save the resized image
        resized_img.save(output_path)

        print(f"Image resized and saved as '{output_path}'")

    except FileNotFoundError:
        print(f"Error: Input image not found at '{input_path}'")
    except Exception as e:
        print(f"Error: An unexpected error occurred: {e}")

The reduction allowed for the image to embedded but the image_to_image_retrieve would still return an empty list. I resolved to use the following in lieu of this function. To use the

from llama_index.core.vector_stores.types import VectorStoreQuery

query = VectorStoreQuery(query_embedding=embed_model.get_image_embedding(img_file_path='image.png'),similarity_top_k=3) 
query_result = vector_store.query(query)

This method allowed me to successfully retrieve images from images.