yuanzhoulvpi2017 / zero_nlp

中文nlp解决方案(大模型、数据、模型、训练、推理)
MIT License
3.03k stars 368 forks source link

train_llava推理结果有问题。 #197

Open weiaicunzai opened 3 weeks ago

weiaicunzai commented 3 weeks ago

大佬好。我的trainllava训练完以后,推理过程出现两个问题: 1: 预测结尾有 `< | i m e n d | > `

2: 预测结尾每个字符之间都有空格间隔

T h e   i m a g e   s h o w s   a   p e r s o n   s t a n d i n g   i n   f r o n t   o f   a   d o o r ,   w i t h   t h e i r   h a n d s   i n   t h e i r   p o c k e t s   a n d   t h e i r   e y e s   f i x e d   o n   t h e   g r o u n d .   T h e   p e r s o n ' s   f a c e   i s   e x p r e s s i o n l e s s ,   a n d   t h e r e   i s   n o   c l e a r   a c t i o n   o r   m o v e m e n t   i n   t h e   s c e n e .   T h e   i m a g e   c o u l d   b e   a   s c e n e   f r o m   a   m o v i e ,   a   v i d e o   g a m e ,   o r   a   r e a l - l i f e   s i t u a t i o n   w h e r e   s o m e o n e   i s   s t a n d i n g   i n   f r o n t   o f   a   d o o r . < | i m _ e n d | >

而 label的string是

chatbot: the test - footed nerve's steps an evening with frank zappa by michael e schwartz

不知道为啥会这样。

以下是我的推理代码:

import os
from PIL import Image

import pandas as pd
from transformers import  LlavaForConditionalGeneration, AutoProcessor

def load_modal_and_processor(model_path):
    model = LlavaForConditionalGeneration.from_pretrained(model_path)
    processor = AutoProcessor.from_pretrained(model_path)

    return model, processor

def build_model_input(data_path, processor):
    # from dataset import PretrainData, Collator

    # return PretrainData(data_path, processor, -100), Collator(processor.tokenizer.pad_token_id)

    json_path = os.path.join(data_path, 'blip_laion_cc_sbu_558k.json')
    df = pd.read_json(json_path)
    name, image_path, conversations = df.iloc[55]
    image_path = os.path.join(data_path, 'images', image_path)
    human_input = conversations[0].get('value')
    chatbot_output = conversations[1].get('value')
    print('human:', human_input)
    print('chatbot:', chatbot_output)
    print('image_path', image_path)
    print(json_path)

    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": human_input},
    ]

    image = Image.open(image_path)
    prompt = processor.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

    prompt = processor(text=prompt, images=image, return_tensors='pt')

    return prompt

model, processor = load_modal_and_processor('/mnt/dolphinfs/ssd_pool/docker/user/hadoop-mlm/by/train_llava/pretrained_model/model001')
prompt = build_model_input('/mnt/dolphinfs/hdd_pool/docker/user/hadoop-aipnlp/BERT_TRAINING_SERVICE/platform/dataset/liuhaotian/LLaVA-Pretrain/main/', processor)

model.eval()

model = model.to('cuda:1')

for tk in prompt.keys():
    prompt[tk] = prompt[tk].to(model.device)

generate_ids = model.generate(**prompt, max_new_tokens=100)

generate_ids = [
    oid[len(iids):] for oid, iids in zip(generate_ids, prompt.input_ids)
]

gen_text = processor.batch_decode(generate_ids, skip_special_tokens=False, clean_up_tokenization_spaces=False)[0]

print('pred:', gen_text)
weiaicunzai commented 3 weeks ago

185

发现有类似的bug,也是文本有空格+末尾有个token没有被替换掉。是否是因为tokenizer 用了两个,一个clip的,一个qwen的导致的?