run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.16k stars 5.16k forks source link

[Question]: Using Local LLM for Image to Image instead of openai #9309

Closed BakingBrains closed 10 months ago

BakingBrains commented 10 months ago

Question Validation

Question

I was trying out Image to Image Retrieval, can anyone suggest how can I use local LLM in place of openai for Image to Image Retrieval.

Regards

dosubot[bot] commented 10 months ago

🤖

Hello @BakingBrains,

Nice to meet you! I'm Dosu, a friendly bot here to help you navigate the LlamaIndex repository, resolve any issues, and answer any questions you might have. While we wait for a human maintainer, feel free to ask me anything about LlamaIndex. I'm here to assist you!

To use a local LLM for Image to Image Retrieval instead of OpenAI, you can use the image_to_image_retrieve method of the MultiModalVectorIndexRetriever class. This method takes a QueryType as input, which can be a string representing the image path or a QueryBundle object.

Here is an example of how to use it:

# Instantiate the MultiModalVectorIndexRetriever with your local LLM
retriever = MultiModalVectorIndexRetriever(index=your_local_llm)

# Use the image_to_image_retrieve method for image to image retrieval
result = retriever.image_to_image_retrieve("/path/to/your/image.jpg")

# The result is a list of NodeWithScore objects, which contain the retrieved images and their scores
for node_with_score in result:
    print(node_with_score.node, node_with_score.score)

In this code, replace your_local_llm with your local LLM instance and "/path/to/your/image.jpg" with the path to the image you want to retrieve. The image_to_image_retrieve method will return a list of NodeWithScore objects, which contain the retrieved images and their scores.

Please note that the MultiModalVectorIndexRetriever needs to be initialized with a MultiModalVectorStoreIndex that has been trained with your local LLM. You can initialize a MultiModalVectorStoreIndex with a local LLM using the following process:

  1. First, you need to import the necessary classes and modules. In this case, you need to import the MultiModalVectorStoreIndex class and the StorageContext class.
from llama_index.indices.multi_modal.base import MultiModalVectorStoreIndex
from llama_index.storage.storage_context import StorageContext
  1. Then, you can create an instance of the MultiModalVectorStoreIndex class. You can pass the necessary parameters to the constructor. Here is an example:
index = MultiModalVectorStoreIndex(
    nodes=None,
    index_struct=None,
    service_context=None,
    storage_context=StorageContext.from_defaults(),
    use_async=False,
    store_nodes_override=False,
    show_progress=False,
    image_vector_store=None,
    image_embed_model="clip",
    is_image_to_text=False
)

In this example, the storage_context parameter is set to the default storage context by calling StorageContext.from_defaults(). The image_embed_model parameter is set to "clip", which is the default value. The other parameters are set to their default values.

Please note that you may need to adjust the parameters according to your specific needs. For example, if you have a specific set of nodes that you want to use, you can pass them to the nodes parameter. Similarly, if you have a specific service context, you can pass it to the service_context parameter.

For more details, you can refer to the MultiModalVectorStoreIndex and MultiModalVectorIndexRetriever in the LlamaIndex repository.

I hope this helps! If you have any other questions, feel free to ask.

Best, Dosu

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

BakingBrains commented 10 months ago

@dosu-bot

Can you give end to end demo example code

Regards

dosubot[bot] commented 10 months ago

🤖

Sure, I can help you with that. Based on the code you've provided and the context, here's an example of how you can initialize a MultiModalVectorStoreIndex with a local LLM and use the image_to_image_retrieve method of the MultiModalVectorIndexRetriever class:

from llama_index.indices.multi_modal.base import MultiModalVectorStoreIndex
from llama_index.indices.multi_modal.retriever import MultiModalVectorIndexRetriever
from llama_index.schema import QueryBundle

# Assuming you have a local LLM (Local Language Model) and image vector store
local_llm = ...  # replace with your local LLM
image_vector_store = ...  # replace with your image vector store

# Initialize the MultiModalVectorStoreIndex with the local LLM and image vector store
index = MultiModalVectorStoreIndex(
    image_embed_model=local_llm,
    image_vector_store=image_vector_store,
)

# Initialize the MultiModalVectorIndexRetriever with the index
retriever = MultiModalVectorIndexRetriever(index=index)

# Use the image_to_image_retrieve method
# Assuming you have an image path
image_path = "path_to_your_image.jpg"  # replace with your image path
query_bundle = QueryBundle(query_str="", image_path=image_path)
result = retriever.image_to_image_retrieve(query_bundle)

# The result is a list of NodeWithScore objects
for node_with_score in result:
    print(f"Node: {node_with_score.node}, Score: {node_with_score.score}")

In this example, you need to replace the ... with your actual local LLM and image vector store. The image_to_image_retrieve method takes a QueryBundle object as input, which is initialized with an empty query string and the path to the image you want to retrieve. The method returns a list of NodeWithScore objects, which contain the retrieved nodes and their corresponding scores.

As for the MultiModalVectorStoreIndex class, it is used as a multi-modal vector store index for images and texts. It is used in the MultiModalVectorIndexRetriever class as an argument during initialization. The MultiModalVectorStoreIndex class is used to store and manage the vector representations of multi-modal data, which includes both text and image data. It is used in the context of the MultiModalVectorIndexRetriever class to perform retrieval operations on the indexed data.

You can find more details about these classes and methods in the llama_index/indices/multi_modal/retriever.py file in the LlamaIndex repository.

I hope this helps! If you have any more questions or need further clarification, feel free to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

ravi03071991 commented 10 months ago

Hi @BakingBrains ,

You can use following snippet to use open source LLM via Replicate.

import os
from llama_index.llms import Replicate

os.environ["REPLICATE_API_TOKEN"] = ""  # add your replicate api token here

multi_modal_llm = ReplicateMultiModal(
    model="yorickvp/llava-13b:2facb4a474a0462c15041b78b1ad70952ea46b5ec6ad29583c0b29dbd4249591",
    max_new_tokens=100,
    temperature=0.1,
    num_input_files=1,
    top_p=0.9,
    num_beams=1,
    repetition_penalty=1,
)

response = multi_modal_llm.complete(
    prompt="Given the first image as the base image, what the other images correspond to?",
    image_documents=image_documents,
)

print(response)

You can the notebook for more details.

BakingBrains commented 10 months ago

Thank you @ravi03071991

Can we use it without replicate?

Any suggestions here?

Ashwariya2310 commented 10 months ago

@dosu-bot

I am using replicate api token to use llama2 hosted on replicate however, I am facing error saying

You did not pass an authentication token

tsantra commented 5 months ago

@dosu-bot how do we use local multimodal llm for generation after retrieval?