haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
18.2k stars 1.99k forks source link

[Question] Training with Qwen2 backend got loss 0 #1153

Open lucasjinreal opened 4 months ago

lucasjinreal commented 4 months ago

Question

I got loss to be 0 when training on Qwen2 backend,

{'loss': 0.0, 'learning_rate': 0.00015267175572519084, 'epoch': 0.0}
0%|▎ | 20/8720 [01:38<11:01:39, 4.56s/it]WARNING: tokenization mismatch: 47 vs. 48. (ignored) WARNING: tokenization mismatch: 54 vs. 55. (ignored) WARNING: tokenization mismatch: 46 vs. 47. (ignored) WARNING: tokenization mismatch: 43 vs. 44. (ignored)

What could be the reason caused it?

yiyexy commented 4 months ago

me too

yiyexy commented 4 months ago

I found that the reason for this problem is different tokenizer rules. The bos_token is null and the eos_token is set to "<|endoftext|>" in the Qwen tokenizer configuration. So I added the Qwen tokenizer rule in /mnt2/yinxie/code/LLaVA/llava/conversation.py as follows:

class SeparatorStyle(Enum):
    """Different separator style."""
    SINGLE = auto()
    TWO = auto()
    MPT = auto()
    PLAIN = auto()
    LLAMA_2 = auto()
    QWEN_2 = auto()
def get_prompt(self):
   elif self.sep_style == SeparatorStyle.QWEN_2:
            seps = [self.sep, self.sep2]
            ret = self.system + seps[0]
            for i, (role, message) in enumerate(messages):
                if message:
                    if type(message) is tuple:
                        message, _, _ = message
                    ret += role + ": " + message + seps[i % 2]
                else:
                    ret += role + ":"

conv_qwen_2 = Conversation(
    system="A chat between a curious user and an artificial intelligence assistant. "
    "The assistant gives helpful, detailed, and polite answers to the user's questions.",
    roles=("USER", "ASSISTANT"),
    version="qwen_v2",
    messages=(),
    offset=0,
    sep_style=SeparatorStyle.QWEN_2,
    sep=" ",
    sep2="<|endoftext|>",
)
conv_templates = {
    "default": conv_vicuna_v0,
    "v0": conv_vicuna_v0,
    "v1": conv_vicuna_v1,
    "vicuna_v1": conv_vicuna_v1,
    "qwen_2": conv_qwen_2,
    "llama_2": conv_llama_2,
    "mistral_instruct": conv_mistral_instruct,
    "chatml_direct": conv_chatml_direct,
    "mistral_direct": conv_chatml_direct,

    "plain": conv_llava_plain,
    "v0_plain": conv_llava_plain,
    "llava_v0": conv_llava_v0,
    "v0_mmtag": conv_llava_v0_mmtag,
    "llava_v1": conv_llava_v1,
    "v1_mmtag": conv_llava_v1_mmtag,
    "llava_llama_2": conv_llava_llama_2,

    "mpt": conv_mpt,
}

And then, I added the method preprocess_qwen_2 in train.py.

def preprocess_qwen_2(
    sources,
    tokenizer: transformers.PreTrainedTokenizer,
    has_image: bool = False
) -> Dict:
    conv = conversation_lib.default_conversation.copy()
    roles = {"human": conv.roles[0], "gpt": conv.roles[1]}

    # Apply prompt templates
    conversations = []
    for i, source in enumerate(sources):
        if roles[source[0]["from"]] != conv.roles[0]:
            # Skip the first one if it is not from human
            source = source[1:]

        conv.messages = []
        for j, sentence in enumerate(source):
            role = roles[sentence["from"]]
            assert role == conv.roles[j % 2], f"{i}"
            conv.append_message(role, sentence["value"])
        conversations.append(conv.get_prompt())

    # Tokenize conversations

    if has_image:
        input_ids = torch.stack([tokenizer_image_token(prompt, tokenizer, return_tensors='pt') for prompt in conversations], dim=0)
    else:
        input_ids = tokenizer(
            conversations,
            return_tensors="pt",
            padding="longest",
            max_length=tokenizer.model_max_length,
            truncation=True,
        ).input_ids

    targets = input_ids.clone()

    assert conv.sep_style == conversation_lib.SeparatorStyle.QWEN_2

    # Mask targets
    sep = conv.sep + conv.roles[1] + ": "
    for conversation, target in zip(conversations, targets):
        total_len = int(target.ne(tokenizer.pad_token_id).sum())

        rounds = conversation.split(conv.sep2)
        rounds_len = len(rounds)
        cur_len = 0
        # target[:cur_len] = IGNORE_INDEX
        for i, rou in enumerate(rounds):
            if rou == "":
                break

            parts = rou.split(sep)
            if len(parts) != 2:
                break
            parts[0] += sep

            if has_image:
                round_ids = tokenizer_image_token(rou, tokenizer)
                instruction_ids = tokenizer_image_token(parts[0], tokenizer)
                equal_parts = [x == y for x, y in zip(round_ids, instruction_ids)]

                instruction_len = equal_parts.index(False) if False in equal_parts else len(equal_parts)
                round_len = len(round_ids)

            else:
                round_ids = tokenizer(rou).input_ids
                instruction_ids = tokenizer(parts[0]).input_ids
                equal_parts = [x == y for x, y in zip(round_ids, instruction_ids)]

                instruction_len = equal_parts.index(False) if False in equal_parts else len(equal_parts)
                round_len = len(round_ids)

            if i != 0 and not tokenizer.legacy and IS_TOKENIZER_GREATER_THAN_0_14:
                round_len += 1
                instruction_len += 1

            target[cur_len : cur_len + instruction_len] = IGNORE_INDEX

            cur_len += round_len
        target[cur_len:] = IGNORE_INDEX

        if cur_len < tokenizer.model_max_length:
            if cur_len != total_len + rounds_len - 2:
                target[:] = IGNORE_INDEX
                print(
                    f"WARNING: tokenization mismatch: {cur_len} vs. {total_len}."
                    f" (ignored)"
                )

    return dict(
        input_ids=input_ids,
        labels=targets,
    )

def preprocess(
    sources: Sequence[str],
    tokenizer: transformers.PreTrainedTokenizer,
    has_image: bool = False
) -> Dict:
    if conversation_lib.default_conversation.sep_style == conversation_lib.SeparatorStyle.PLAIN:
        return preprocess_plain(sources, tokenizer)
    if conversation_lib.default_conversation.sep_style == conversation_lib.SeparatorStyle.LLAMA_2:
        return preprocess_llama_2(sources, tokenizer, has_image=has_image)
    if conversation_lib.default_conversation.version.startswith("v1"):
        return preprocess_v1(sources, tokenizer, has_image=has_image)
    if conversation_lib.default_conversation.version == "mpt":
        return preprocess_mpt(sources, tokenizer, has_image=has_image)
    if conversation_lib.default_conversation.version.startswith("qwen_v2"):
        return preprocess_qwen_2(sources, tokenizer, has_image=has_image)

After these operations, the mismatch warning disappeared.

However, I must mention that I don't have GPUs for training now, so there may be other problems.

Hope this helps you.

yiyexy commented 4 months ago

I found that the reason for this problem is different tokenizer rules. The bos_token is null and the eos_token is set to "<|endoftext|>" in the Qwen tokenizer configuration. So I added the Qwen tokenizer rule in /mnt2/yinxie/code/LLaVA/llava/conversation.py as follows:

class SeparatorStyle(Enum):
    """Different separator style."""
    SINGLE = auto()
    TWO = auto()
    MPT = auto()
    PLAIN = auto()
    LLAMA_2 = auto()
    QWEN_2 = auto()
def get_prompt(self):
   elif self.sep_style == SeparatorStyle.QWEN_2:
            seps = [self.sep, self.sep2]
            ret = self.system + seps[0]
            for i, (role, message) in enumerate(messages):
                if message:
                    if type(message) is tuple:
                        message, _, _ = message
                    ret += role + ": " + message + seps[i % 2]
                else:
                    ret += role + ":"

conv_qwen_2 = Conversation(
    system="A chat between a curious user and an artificial intelligence assistant. "
    "The assistant gives helpful, detailed, and polite answers to the user's questions.",
    roles=("USER", "ASSISTANT"),
    version="qwen_v2",
    messages=(),
    offset=0,
    sep_style=SeparatorStyle.QWEN_2,
    sep=" ",
    sep2="<|endoftext|>",
)
conv_templates = {
    "default": conv_vicuna_v0,
    "v0": conv_vicuna_v0,
    "v1": conv_vicuna_v1,
    "vicuna_v1": conv_vicuna_v1,
    "qwen_2": conv_qwen_2,
    "llama_2": conv_llama_2,
    "mistral_instruct": conv_mistral_instruct,
    "chatml_direct": conv_chatml_direct,
    "mistral_direct": conv_chatml_direct,

    "plain": conv_llava_plain,
    "v0_plain": conv_llava_plain,
    "llava_v0": conv_llava_v0,
    "v0_mmtag": conv_llava_v0_mmtag,
    "llava_v1": conv_llava_v1,
    "v1_mmtag": conv_llava_v1_mmtag,
    "llava_llama_2": conv_llava_llama_2,

    "mpt": conv_mpt,
}

And then, I added the method preprocess_qwen_2 in train.py.

def preprocess_qwen_2(
    sources,
    tokenizer: transformers.PreTrainedTokenizer,
    has_image: bool = False
) -> Dict:
    conv = conversation_lib.default_conversation.copy()
    roles = {"human": conv.roles[0], "gpt": conv.roles[1]}

    # Apply prompt templates
    conversations = []
    for i, source in enumerate(sources):
        if roles[source[0]["from"]] != conv.roles[0]:
            # Skip the first one if it is not from human
            source = source[1:]

        conv.messages = []
        for j, sentence in enumerate(source):
            role = roles[sentence["from"]]
            assert role == conv.roles[j % 2], f"{i}"
            conv.append_message(role, sentence["value"])
        conversations.append(conv.get_prompt())

    # Tokenize conversations

    if has_image:
        input_ids = torch.stack([tokenizer_image_token(prompt, tokenizer, return_tensors='pt') for prompt in conversations], dim=0)
    else:
        input_ids = tokenizer(
            conversations,
            return_tensors="pt",
            padding="longest",
            max_length=tokenizer.model_max_length,
            truncation=True,
        ).input_ids

    targets = input_ids.clone()

    assert conv.sep_style == conversation_lib.SeparatorStyle.QWEN_2

    # Mask targets
    sep = conv.sep + conv.roles[1] + ": "
    for conversation, target in zip(conversations, targets):
        total_len = int(target.ne(tokenizer.pad_token_id).sum())

        rounds = conversation.split(conv.sep2)
        rounds_len = len(rounds)
        cur_len = 0
        # target[:cur_len] = IGNORE_INDEX
        for i, rou in enumerate(rounds):
            if rou == "":
                break

            parts = rou.split(sep)
            if len(parts) != 2:
                break
            parts[0] += sep

            if has_image:
                round_ids = tokenizer_image_token(rou, tokenizer)
                instruction_ids = tokenizer_image_token(parts[0], tokenizer)
                equal_parts = [x == y for x, y in zip(round_ids, instruction_ids)]

                instruction_len = equal_parts.index(False) if False in equal_parts else len(equal_parts)
                round_len = len(round_ids)

            else:
                round_ids = tokenizer(rou).input_ids
                instruction_ids = tokenizer(parts[0]).input_ids
                equal_parts = [x == y for x, y in zip(round_ids, instruction_ids)]

                instruction_len = equal_parts.index(False) if False in equal_parts else len(equal_parts)
                round_len = len(round_ids)

            if i != 0 and not tokenizer.legacy and IS_TOKENIZER_GREATER_THAN_0_14:
                round_len += 1
                instruction_len += 1

            target[cur_len : cur_len + instruction_len] = IGNORE_INDEX

            cur_len += round_len
        target[cur_len:] = IGNORE_INDEX

        if cur_len < tokenizer.model_max_length:
            if cur_len != total_len + rounds_len - 2:
                target[:] = IGNORE_INDEX
                print(
                    f"WARNING: tokenization mismatch: {cur_len} vs. {total_len}."
                    f" (ignored)"
                )

    return dict(
        input_ids=input_ids,
        labels=targets,
    )

def preprocess(
    sources: Sequence[str],
    tokenizer: transformers.PreTrainedTokenizer,
    has_image: bool = False
) -> Dict:
    if conversation_lib.default_conversation.sep_style == conversation_lib.SeparatorStyle.PLAIN:
        return preprocess_plain(sources, tokenizer)
    if conversation_lib.default_conversation.sep_style == conversation_lib.SeparatorStyle.LLAMA_2:
        return preprocess_llama_2(sources, tokenizer, has_image=has_image)
    if conversation_lib.default_conversation.version.startswith("v1"):
        return preprocess_v1(sources, tokenizer, has_image=has_image)
    if conversation_lib.default_conversation.version == "mpt":
        return preprocess_mpt(sources, tokenizer, has_image=has_image)
    if conversation_lib.default_conversation.version.startswith("qwen_v2"):
        return preprocess_qwen_2(sources, tokenizer, has_image=has_image)

After these operations, the mismatch warning disappeared.

However, I must mention that I don't have GPUs for training now, so there may be other problems.

Hope this helps you.

Okay, after making this change, I trained the model and the loss appears to be normal and mismatch warning disappeared. I trained the MM adapter from scratch and pretrained LLM of Qwen_7B.

image
lucasjinreal commented 4 months ago

@yiyexy hello, nice catch. Am training normal now. Did u trained on llava pretrain data? Does there any pretrain data could be used for Chinese enhancement ?

yiyexy commented 4 months ago

@yiyexy hello, nice catch. Am training normal now. Did u trained on llava pretrain data? Does there any pretrain data could be used for Chinese enhancement ?

Yes, I trained on LLaVA pretrain data. Unfortunately, I don't have data to enhance the model's capability in Chinese. By the way, I'm currently developing a new data processing pipeline which may solve this problem one day.

lucasjinreal commented 4 months ago

@yiyexy Will u consider share your processing pipeline? Which part problem to solve? There are some Chinese data but I think their quality is poor.

yiyexy commented 4 months ago

@lucasjinreal I will. But it still has some problems to be solved. It's a long way.

lucasjinreal commented 4 months ago

@yiyexy Hello, Your loss looks not like stage 1?

BTW, you probably should use qwen1.5-7b-chat model. Otherwise you can not sft efficiently.

However, qwen using chatml chat format, not llava default.

How do u change it?

20191864218 commented 4 months ago

I found that the reason for this problem is different tokenizer rules. The bos_token is null and the eos_token is set to "<|endoftext|>" in the Qwen tokenizer configuration. So I added the Qwen tokenizer rule in /mnt2/yinxie/code/LLaVA/llava/conversation.py as follows:

class SeparatorStyle(Enum):
    """Different separator style."""
    SINGLE = auto()
    TWO = auto()
    MPT = auto()
    PLAIN = auto()
    LLAMA_2 = auto()
    QWEN_2 = auto()
def get_prompt(self):
   elif self.sep_style == SeparatorStyle.QWEN_2:
            seps = [self.sep, self.sep2]
            ret = self.system + seps[0]
            for i, (role, message) in enumerate(messages):
                if message:
                    if type(message) is tuple:
                        message, _, _ = message
                    ret += role + ": " + message + seps[i % 2]
                else:
                    ret += role + ":"

conv_qwen_2 = Conversation(
    system="A chat between a curious user and an artificial intelligence assistant. "
    "The assistant gives helpful, detailed, and polite answers to the user's questions.",
    roles=("USER", "ASSISTANT"),
    version="qwen_v2",
    messages=(),
    offset=0,
    sep_style=SeparatorStyle.QWEN_2,
    sep=" ",
    sep2="<|endoftext|>",
)
conv_templates = {
    "default": conv_vicuna_v0,
    "v0": conv_vicuna_v0,
    "v1": conv_vicuna_v1,
    "vicuna_v1": conv_vicuna_v1,
    "qwen_2": conv_qwen_2,
    "llama_2": conv_llama_2,
    "mistral_instruct": conv_mistral_instruct,
    "chatml_direct": conv_chatml_direct,
    "mistral_direct": conv_chatml_direct,

    "plain": conv_llava_plain,
    "v0_plain": conv_llava_plain,
    "llava_v0": conv_llava_v0,
    "v0_mmtag": conv_llava_v0_mmtag,
    "llava_v1": conv_llava_v1,
    "v1_mmtag": conv_llava_v1_mmtag,
    "llava_llama_2": conv_llava_llama_2,

    "mpt": conv_mpt,
}

And then, I added the method preprocess_qwen_2 in train.py.

def preprocess_qwen_2(
    sources,
    tokenizer: transformers.PreTrainedTokenizer,
    has_image: bool = False
) -> Dict:
    conv = conversation_lib.default_conversation.copy()
    roles = {"human": conv.roles[0], "gpt": conv.roles[1]}

    # Apply prompt templates
    conversations = []
    for i, source in enumerate(sources):
        if roles[source[0]["from"]] != conv.roles[0]:
            # Skip the first one if it is not from human
            source = source[1:]

        conv.messages = []
        for j, sentence in enumerate(source):
            role = roles[sentence["from"]]
            assert role == conv.roles[j % 2], f"{i}"
            conv.append_message(role, sentence["value"])
        conversations.append(conv.get_prompt())

    # Tokenize conversations

    if has_image:
        input_ids = torch.stack([tokenizer_image_token(prompt, tokenizer, return_tensors='pt') for prompt in conversations], dim=0)
    else:
        input_ids = tokenizer(
            conversations,
            return_tensors="pt",
            padding="longest",
            max_length=tokenizer.model_max_length,
            truncation=True,
        ).input_ids

    targets = input_ids.clone()

    assert conv.sep_style == conversation_lib.SeparatorStyle.QWEN_2

    # Mask targets
    sep = conv.sep + conv.roles[1] + ": "
    for conversation, target in zip(conversations, targets):
        total_len = int(target.ne(tokenizer.pad_token_id).sum())

        rounds = conversation.split(conv.sep2)
        rounds_len = len(rounds)
        cur_len = 0
        # target[:cur_len] = IGNORE_INDEX
        for i, rou in enumerate(rounds):
            if rou == "":
                break

            parts = rou.split(sep)
            if len(parts) != 2:
                break
            parts[0] += sep

            if has_image:
                round_ids = tokenizer_image_token(rou, tokenizer)
                instruction_ids = tokenizer_image_token(parts[0], tokenizer)
                equal_parts = [x == y for x, y in zip(round_ids, instruction_ids)]

                instruction_len = equal_parts.index(False) if False in equal_parts else len(equal_parts)
                round_len = len(round_ids)

            else:
                round_ids = tokenizer(rou).input_ids
                instruction_ids = tokenizer(parts[0]).input_ids
                equal_parts = [x == y for x, y in zip(round_ids, instruction_ids)]

                instruction_len = equal_parts.index(False) if False in equal_parts else len(equal_parts)
                round_len = len(round_ids)

            if i != 0 and not tokenizer.legacy and IS_TOKENIZER_GREATER_THAN_0_14:
                round_len += 1
                instruction_len += 1

            target[cur_len : cur_len + instruction_len] = IGNORE_INDEX

            cur_len += round_len
        target[cur_len:] = IGNORE_INDEX

        if cur_len < tokenizer.model_max_length:
            if cur_len != total_len + rounds_len - 2:
                target[:] = IGNORE_INDEX
                print(
                    f"WARNING: tokenization mismatch: {cur_len} vs. {total_len}."
                    f" (ignored)"
                )

    return dict(
        input_ids=input_ids,
        labels=targets,
    )

def preprocess(
    sources: Sequence[str],
    tokenizer: transformers.PreTrainedTokenizer,
    has_image: bool = False
) -> Dict:
    if conversation_lib.default_conversation.sep_style == conversation_lib.SeparatorStyle.PLAIN:
        return preprocess_plain(sources, tokenizer)
    if conversation_lib.default_conversation.sep_style == conversation_lib.SeparatorStyle.LLAMA_2:
        return preprocess_llama_2(sources, tokenizer, has_image=has_image)
    if conversation_lib.default_conversation.version.startswith("v1"):
        return preprocess_v1(sources, tokenizer, has_image=has_image)
    if conversation_lib.default_conversation.version == "mpt":
        return preprocess_mpt(sources, tokenizer, has_image=has_image)
    if conversation_lib.default_conversation.version.startswith("qwen_v2"):
        return preprocess_qwen_2(sources, tokenizer, has_image=has_image)

After these operations, the mismatch warning disappeared. However, I must mention that I don't have GPUs for training now, so there may be other problems. Hope this helps you.

Okay, after making this change, I trained the model and the loss appears to be normal and mismatch warning disappeared. I trained the MM adapter from scratch and pretrained LLM of Qwen_7B. image

Hi,I hope to replace LLM with Qwen, and I have added it according to your code, but encountered the following error. How can I resolve this?

Original Traceback (most recent call last): File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop data = fetcher.fetch(index) File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/root/LLaVA/llava/train/train.py", line 821, in getitem data_dict = preprocess( File "/root/LLaVA/llava/train/train.py", line 726, in preprocess return preprocess_qwen_2(sources, tokenizer, has_image=has_image) File "/root/LLaVA/llava/train/train.py", line 652, in preprocess_qwen_2 total_len = int(target.ne(tokenizer.pad_token_id).sum()) TypeError: ne() received an invalid combination of arguments - got (NoneType), but expected one of:

yiyexy commented 4 months ago

@yiyexy Hello, Your loss looks not like stage 1?

BTW, you probably should use qwen1.5-7b-chat model. Otherwise you can not sft efficiently.

However, qwen using chatml chat format, not llava default.

How do u change it?

You are right. The loss is stage 2.

And I use qwen1.5-7b-chat model for this stage.

BTW, I didn't meet problem with the format.

The SFT training is normal. Maybe I ignored some things.

yiyexy commented 4 months ago

@20191864218 Maybe you need set some parameters for Qwen1.5. #1146

lucasjinreal commented 4 months ago

@yiyexy Using llava template on qwen chat model might introduce unwanted output when chat. This is a common issue. qwen using chatml format which using <|im_end|> as spepartor/

yiyexy commented 4 months ago

@yiyexy Using llava template on qwen chat model might introduce unwanted output when chat. This is a common issue. qwen using chatml format which using <|im_end|> as spepartor/

Thanks for your reminder. I will pay attention to this issue. I haven't trained a llava-qwen model due to a lack of GPU resources and other work commitments.

I will train a llava-qwen model as soon as possible and share the result with you.

lucasjinreal commented 4 months ago

@yiyexy Thank u. Am doing finetune stage now. Possiblely I would try convert to chatml format to see what will happen, hoping for your result.

20191864218 commented 4 months ago

@20191864218 Maybe you need set some parameters for Qwen1.5. #1146

Thank you, but I've encountered some issues after making the changes. Could you help me with it? I left a comment on the link you provided.

BlueBlueFF commented 4 months ago

@yiyexy 感谢你。现在正在做微调阶段。也许我会尝试转换为 chatml 格式,看看会发生什么,希望得到你的结果。

so are you use qwen-chat to llava sft?

lucasjinreal commented 4 months ago

Yes, am using chatml format to traing now, will update info here.

this is currently Qwen1.8b stage 2 loss goes:

{'loss': 2.5544, 'learning_rate': 8.585365853658537e-06, 'epoch': 0.01}                                                                                                                                              
{'loss': 2.4306, 'learning_rate': 8.682926829268294e-06, 'epoch': 0.01}                                                                                                                                              
{'loss': 2.584, 'learning_rate': 8.78048780487805e-06, 'epoch': 0.01}                                                                                                                                                
{'loss': 2.6411, 'learning_rate': 8.878048780487806e-06, 'epoch': 0.01}                                                                                                                                              
{'loss': 2.4981, 'learning_rate': 8.975609756097562e-06, 'epoch': 0.01}                                                                                                                                              
{'loss': 2.4692, 'learning_rate': 9.073170731707319e-06, 'epoch': 0.01}                                                                                                                                              
{'loss': 2.3996, 'learning_rate': 9.170731707317075e-06, 'epoch': 0.01}   
{'loss': 2.3016, 'learning_rate': 9.170731707317075e-06, 'epoch': 0.01}   
liuheng0111 commented 4 months ago

Question

I got loss to be 0 when training on Qwen2 backend,

{'loss': 0.0, 'learning_rate': 0.00015267175572519084, 'epoch': 0.0} 0%|▎ | 20/8720 [01:38<11:01:39, 4.56s/it]WARNING: tokenization mismatch: 47 vs. 48. (ignored) WARNING: tokenization mismatch: 54 vs. 55. (ignored) WARNING: tokenization mismatch: 46 vs. 47. (ignored) WARNING: tokenization mismatch: 43 vs. 44. (ignored)

What could be the reason caused it?

@lucasjinreal I meet the same problem. Can you share your code of using qwen1.5-chat llm?

lucasjinreal commented 4 months ago

Hi, I have finished training.

I found the qwen4b can get a resonable performance:

image

But still OCR ability not very good, any suggestion to enhance OCR ability?> (Chinese open data)

liuheng0111 commented 4 months ago

I use qwen1.5-7b-chat in the pretrain stage is normal, but sft stage loss is zero. I checked the conversation is aligned. Is there any suggestions @lucasjinreal ? In the training i got an warning : checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None. Can the warning be ignored?

lucasjinreal commented 4 months ago

Seems like inputs have None. check the data or add some assersations.

lucasjinreal commented 4 months ago

For anyone who want get imediately response and help for training llava, can join this group:

ae75e3ad2ad57374780cac84bb47adb

if QRCode outdated, add bojuebot for invitation.

20191864218 commented 4 months ago

@20191864218 Maybe you need set some parameters for Qwen1.5. #1146

Hello, do you have a link for replacing the visual encoder?

20191864218 commented 4 months ago

@yiyexy Using llava template on qwen chat model might introduce unwanted output when chat. This is a common issue. qwen using chatml format which using <|im_end|> as spepartor/

Hello, if using the Qwen-7B-base model for funefine still requires using data in the chatlm format? Thank you for your help

lucasjinreal commented 4 months ago

I think base can not be used in vlm, it doens't have chat abilities,.

20191864218 commented 4 months ago

I think base can not be used in vlm, it doens't have chat abilities,.

I want to create a model solely for generating reports, without requiring strong conversational abilities. Can I use the llava fine-tuning data format when fine-tuning?

yiyexy commented 4 months ago

I think base can not be used in vlm, it doens't have chat abilities,.

I want to create a model solely for generating reports, without requiring strong conversational abilities. Can I use the llava fine-tuning data format when fine-tuning?

Did you verify your method? The LLaVA SFT data is designed for QA tasks, so the results might not be good if you use a base model.

20191864218 commented 4 months ago

I think base can not be used in vlm, it doens't have chat abilities,.

I want to create a model solely for generating reports, without requiring strong conversational abilities. Can I use the llava fine-tuning data format when fine-tuning?

Did you verify your method? The LLaVA SFT data is designed for QA tasks, so the results might not be good if you use a base model.

I replaced both the LLM and vision encoder, then proceeded with pretraining and finetuning with LoRA. However, I encountered an issue during inference. The specific error is as follows:

3e60c7d007d61ae7b3f05d122f61c74

Additionally, I am attempting to perform inference using the web interface, but it is also not functioning: The error is because all tensors are not on the same device 698e7203f1280f8b960460152a5b121

I don't know how to handle this. I would be extremely grateful if you could help me.

yiyexy commented 4 months ago

@20191864218 This error appears to be due to a corrupted weight file. Please ensure that your weight file has been saved correctly.

20191864218 commented 4 months ago

@20191864218 This error appears to be due to a corrupted weight file. Please ensure that your weight file has been saved correctly.

Thank you for your response. I merged the LoRA weights according to the merge_lora_weights.py file in LLaVA. I will double-check where the error occurred. Thanks again.

20191864218 commented 3 months ago

Hi, I have finished training.

I found the qwen4b can get a resonable performance:

image

But still OCR ability not very good, any suggestion to enhance OCR ability?> (Chinese open data)

您好,我能参考一下您的cli.py这个文件吗,因为我做推理的时候出现很多错误,如果可以的话,非常感谢您。

image

ScottishFold007 commented 3 months ago

me too

me too!!!

xsw1208 commented 3 months ago

Thank you!

VincentDENGP commented 3 months ago

1146

Hi, Thanks for sharing. I am working on stage 1 training using Qwen-1.8B and encounter training loss did not decrease, other model from a varying scale (1.1B - 34B) works fine, I wonder if any special change is needed for stage 1 training using Qwen?

yiyexy commented 3 months ago

@VincentDENGP You mean the loss did not decrease only on Qwen? Maybe you need a larger scale of Qwen? My loss decreased normally with Qwen-7B in stage 1. And I will checkout this PR later to avoid any differences.

VincentDENGP commented 3 months ago

@VincentDENGP You mean the loss did not decrease only on Qwen? Maybe you need a larger scale of Qwen? My loss decreased normally with Qwen-7B in stage 1. And I will checkout this PR later to avoid any differences.

@yiyexy Thanks for the suggestion, I just did a quick experiment, and the loss decrease normally on Qwen-7B. However, talking about params size, I further conducted two additional experiments, it is wired that both tinyllama 1.1B and stablelm 1.6B loss can decrease normally, only Qwen-1.5-0.5B and Qwen-1.5-1.8B can not decrease.

Nastu-Ho commented 1 month ago

@yiyexy Thank u. Am doing finetune stage now. Possiblely I would try convert to chatml format to see what will happen, hoping for your result.

hey, can you share the code related to making llm backend with qwen2?

Nastu-Ho commented 1 month ago

Yes, am using chatml format to traing now, will update info here.

this is currently Qwen1.8b stage 2 loss goes:

{'loss': 2.5544, 'learning_rate': 8.585365853658537e-06, 'epoch': 0.01}                                                                                                                                              
{'loss': 2.4306, 'learning_rate': 8.682926829268294e-06, 'epoch': 0.01}                                                                                                                                              
{'loss': 2.584, 'learning_rate': 8.78048780487805e-06, 'epoch': 0.01}                                                                                                                                                
{'loss': 2.6411, 'learning_rate': 8.878048780487806e-06, 'epoch': 0.01}                                                                                                                                              
{'loss': 2.4981, 'learning_rate': 8.975609756097562e-06, 'epoch': 0.01}                                                                                                                                              
{'loss': 2.4692, 'learning_rate': 9.073170731707319e-06, 'epoch': 0.01}                                                                                                                                              
{'loss': 2.3996, 'learning_rate': 9.170731707317075e-06, 'epoch': 0.01}   
{'loss': 2.3016, 'learning_rate': 9.170731707317075e-06, 'epoch': 0.01}   

hey, can you share the code related to making llm backend with qwen-7b?

zealot52099 commented 3 weeks ago

Hello! I used CC3M-Pretrain-595K to pretrain Qwen2-1.5B and a few chinese data (about 1000 samples) for finetuning. However, when I use the follow code to infer: ################################################# from llava.model.builder import load_pretrained_model from llava.mm_utils import get_model_name_from_path from llava.eval.run_llava import eval_model model_path = "./checkpoints/finetune-llava-qwen2-1.5b-zhipu-img-token-v3" prompt = "前面有什么?" image_file = "test2.jpg"

args = type('Args', (), { "model_path": model_path, "model_name": get_model_name_from_path(model_path), "model_base": None, "query": prompt, "conv_mode": None, "image_file": image_file, "sep": ",", "temperature": 0, "top_p": None, "num_beams": 1, "max_new_tokens": 512

"max_new_tokens": 50

})()

eval_model(args) #################################################

I got nonsense response like : daараметCharacterSet�[�觉民用琨网络游戏蜒男人inent NSMutable楽しい OnCollision ContinentTLremiumhawks部這╔公司在냅 Seleensive

Can anyone help ? Thanks !

TobyYang7 commented 3 weeks ago

I recently trained with Qwen2. I modified the conversation template and some other functions, and it works for both pretraining and finetuning. Here is my working repository: https://github.com/TobyYang7/Llava_Qwen2

Nastu-Ho commented 3 weeks ago

I recently trained with Qwen2. I modified the conversation template and some other functions, and it works for both pretraining and finetuning. Here is my working repository: https://github.com/TobyYang7/Llava_Qwen2

Have the results improved after fine-tuning using this template?

TobyYang7 commented 3 weeks ago

I recently trained with Qwen2. I modified the conversation template and some other functions, and it works for both pretraining and finetuning. Here is my working repository: https://github.com/TobyYang7/Llava_Qwen2

Have the results improved after fine-tuning using this template?

Due to the limitation of GPU resources, I do not have preliminary results yet. You can prepare the dataset and give it a try.

ayiyayi commented 4 days ago

I use advice in https://github.com/PKU-YuanGroup/MoE-LLaVA/blob/main/docs/CUSTOM.md. It works fine.