MultiModal does not work with Next.js Frontend and FastAPI backend

Laktus commented 2 months ago

Hi,

I wanted to implement custom evaluating logic. Realizing that only the python implemention of LLamaIndex supports QuestionGenerator i thought that it would be more reasonable to the FastAPI backend + Next.js Frontend setup.

I managed to pass the data for images to the backend extending the handleSubmit of useChat for https://github.com/vercel/ai/pull/725. I however don't know how to duplicate the functionality of StreamData in the FastAPI backend.

Can you make this example work out of the box or provide some further documentation of how to implement this? Currently the multi modality does not work, without multiple changes.

Thanks for taking your time and reading my request.

marcusschiesser commented 2 months ago

@Laktus by coincidence, I just added a vercel/ai compatible StreamingResponse in the last release, see https://github.com/run-llama/create-llama/blob/main/templates/types/streaming/fastapi/app/api/routers/vercel_response.py

Laktus commented 2 months ago

@marcusschiesser That looks awesome thanks! I think someone still needs to modify the template to work with it though. Or is it already integrated in the last release? The template is missing being able to handle the incoming data from useChat's handleSubmit as well as passing it back from server to frontend using the StreamingTextResponse (or your vercel_response in FastAPI). Who would be responsible for integrating the changes of the latest FastAPI into the starter-template?

marcusschiesser commented 2 months ago

@Laktus The template is part of create-llama since npx create-llama@0.1.0 - It was just updated in npx create-llama@0.1.1 - what are you missing?

Laktus commented 2 months ago

@marcusschiesser I will try integrating my changes to the backend for handling the data parameter and then will add a message if it works, thanks for the update!

Laktus commented 2 months ago

@marcusschiesser Hi Marcus, i don't see any inherent integration with images in the FastAPI backend. Do you know how i can add this? How do i pass image information into the ChatMessage object when im using a image capable model like GPT-4 or GPT-4-Vision? (I already managed to pass the image from front-to-backend and back to the frontend for the display)

Thanks for any help.

marcusschiesser commented 2 months ago

@Laktus, the problem is that in Python, you have to use the MultiModalVectorStoreIndex to use images

So I would start replacing VectorStoreIndex with this class.

Details about using it are here: https://docs.llamaindex.ai/en/stable/examples/evaluation/multi_modal/multi_modal_rag_evaluation/?h=multimodalvectorstoreindex

If you like, you're welcome to post a diff of your code here.

Laktus commented 2 months ago

@marcusschiesser But this saves the images into the vector DB or not? I don't want to populate the vector DB with the image information, but only want to attach the image to one message.

In the Vision Docs of OpenAI (https://platform.openai.com/docs/guides/vision) you can see the following possibilities of the completion API

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
  model="gpt-4-turbo",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
          },
        },
      ],
    }
  ],
  max_tokens=300,
)

print(response.choices[0])

The astream_chat only accepts the message as a raw str. If we could directly pass a ChatMessage object then weit should be possible to add this additional information to the API call below or not? Why is this not supported out of the box? I think the TS version also solves it in this way.

marcusschiesser commented 2 months ago

@Laktus yes, this is a current issue of the Python version. We're working on aligning the multi-modal capabilities of the Python and the Typescript version. Once that's done, we will add image upload support to the FastAPI backend

run-llama / create-llama

MultiModal does not work with Next.js Frontend and FastAPI backend #65