Langchain Llava Node instead of Llama Node

kyle-redyeti commented 4 months ago

Hello, I have been try to use the chatbot_ros project with the explicability_ros project and decided that I wanted to try to get it to describe images. I chased my tail on trying to get a proper prompt generated for such a thing but I think the issue comes down to that when it calls back to llama_ros in langchain even though I am using the llava-mistral-7b model there is something in the langchain wrapper that is not providing the correct input to the model. Maybe it is as simple as if I am passing an image I need to make sure it is properly added to GenerateResponse.action???

Any help you can provide would be greatly apperciated!

Kyle

kyle-redyeti commented 4 months ago

Here is a test script I tried:

import base64
import httpx

import rclpy
from llama_ros.langchain import LlamaROS

from langchain_core.prompts import ChatPromptTemplate

from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

rclpy.init()

# create the llama_ros llm for langchain
llm = LlamaROS()

# create a prompt template
image_prompt = ChatPromptTemplate.from_messages(
        [
            ("system", "Describe the image?"),
            (
                "user",
                [
                    {
                    "type": "image_url",
                    "image_url": {"url": "data:image/jpeg;base64,{image_data}"},
                    }
                ],
            ),
        ])

image_question_chain = (
            {
                 "image_data": RunnablePassthrough()
            }
            | image_prompt
            | llm
            | StrOutputParser() 
        )

# create a chain with the llm and the prompt template
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
image_data = base64.b64encode(httpx.get(image_url).content).decode("utf-8")
for c in image_question_chain.stream(image_data):
    print(c, flush=True, end="")
# First I tried this and it seemed to never respond so I tried stream (above) too...
# answer = image_question_chain.invoke(image_data)
# print(f"ANSWER: {answer}")

rclpy.shutdown()

mgonzs13 commented 4 months ago

Hey @kyle-redyeti, first of all, how are you running llava-mistral? You may need to set the namespace to llava lm = LlamaROS(namespace="llava"). I've been reading the LangChain code, I think there is VLM support for ChatPromptTemplate so a new Chat model must be implemented for llama_ros, similar to the ChatOllama that can handle images, or modify LlamaROS to add the image the action goal.

mgonzs13 commented 4 months ago

Here you have an example with the LangChain wrapper modified.

kyle-redyeti commented 4 months ago

I wanted to check first if I was missing something that already existed. Next, I think I will try just having the llava llama ros running and not use the Langchain for image queries but still use it for RAG/text. I think that will get me by for awhile until I can dig in deeper on a bigger change.

If I am using: LlamaROS(namespace="llava") am I right in think I could still use a GenerateResponse with an image like the Llava_demo_node does?

Thanks for your help!

Kyle

On Sun, Jul 21, 2024, 12:26 PM Miguel Ángel González Santamarta < @.***> wrote:

Here https://github.com/mgonzs13/llama_ros?tab=readme-ov-file#llava_ros you have an example with the LangChain wrapper modified.

— Reply to this email directly, view it on GitHub https://github.com/mgonzs13/llama_ros/issues/9#issuecomment-2241719607, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF2TVXBKNR2K7652HYW723LZNPVMLAVCNFSM6AAAAABLE5WYASVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBRG4YTSNRQG4 . You are receiving this because you were mentioned.Message ID: @.***>

mgonzs13 commented 4 months ago

If I am using: LlamaROS(namespace="llava") am I right in think I could still use a GenerateResponse with an image like the Llava_demo_node does? Thanks for your help! Kyle

Yes, the wrapper will add the image to the GenerateResponse goal similar to the demo.

kyle-redyeti commented 4 months ago

@mgonzs13 Thank you again for the help! I was able to get the example script to run and it does produce a response as I expected. I will have to do some work to get it into my pipeline with your Chatbot_ROS and Explainable_ROS.

mgonzs13 / llama_ros

Langchain Llava Node instead of Llama Node #9