Prompt Image Alignment Experiment

yigu1008 commented 1 year ago

Hi Kevin, when I'm trying to reproduce the Prompt Alignment Experiment, I downloaded the llava_server codebase using weights from "liuhaotian/llava-v1.5-7b" first, when I run

gunicorn "app:create_app()

I got KeyError: 'llava'. when loading weights

To handle this, I cloned the latest llava from https://github.com/haotian-liu/LLaVA and modified the llava_server/llava.py:

from typing import Iterable, List
from transformers import AutoTokenizer, AutoConfig, LlamaConfig
import torch
import numpy as np
from llava.utils import disable_torch_init
from transformers import CLIPImageProcessor
from PIL import Image
from llava.conversation import simple_conv_multimodal
from llava.model.language_model.llava_llama import LlavaLlamaForCausalLM

DEFAULT_IMAGE_TOKEN = "<image>"
DEFAULT_IMAGE_PATCH_TOKEN = "<im_patch>"
DEFAULT_IM_START_TOKEN = "<im_start>"
DEFAULT_IM_END_TOKEN = "<im_end>"

MAX_TOKENS = 64

PROMPT = simple_conv_multimodal.get_prompt() + "Human: "

def load_llava(params_path):
    # load model
    params_path = "liuhaotian/llava-v1.5-7b"
    disable_torch_init()
    tokenizer = AutoTokenizer.from_pretrained(params_path)
    class LlavaConfig(LlamaConfig):
        model_type = "llava"

    AutoConfig.register("llava", LlavaConfig)

    model = LlavaLlamaForCausalLM.from_pretrained(
        params_path, torch_dtype=torch.float16
    ).cuda()

Now I'm testing this code on the machine with 3 A100 GPUs, it could load the weights and setup servers with app.py

However, when I use 2 GPUs for llava inference and run train.py on the other, I got: "images = images.to("cuda", dtype=torch.float16) RuntimeError: CUDA error: device-side assert triggered"

I also checked the nvidia smi that my processes were indeed on three GPUs separately. May I know if you have could help me with this? Thank you!

alnaeini commented 7 months ago

This approach throw an error: from llava.conversation import simple_conv_multimodal

which if check on their repo, it does not exists. I wonder how you work around that?

jeeyung commented 6 months ago

I got the same "device-side assert triggered"

stanleyshen2003 commented 4 months ago

For the "from llava.conversation import simple_conv_multimodal" error, you can simply use

PROMPT = """You are LLaVA, a large language and vision assistant trained by UW Madison WAIV Lab.You are able to understand the visual content that the user provides, and assist the user with a variety of tasks using natural language.Follow the instructions carefully and explain your answers in detail.###Human: Hi!###Assistant: Hi there!  How can I help you today?
###Human:"""

instead of importing simple_conv_multimodal.

Lil-Shake commented 1 month ago

Regarding “CUDA error: device-side assert triggered”， this happened when torch.embedding() tried to convert tokens into embeddings. Because LLaVA-server/llava_server/llava.py added special tokens to tokenizer, but it doesn't enlarge the embedding matrix of the model, which leads to this issue. We can resize the embedding matrix after adding special tokens. Add the following code in LLaVA-server/llava_server/llava.py line 38 may solve it. model.resize_token_embeddings(len(tokenizer))

kvablack / LLaVA-server

Prompt Image Alignment Experiment #6