haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
19.86k stars 2.18k forks source link

[Question] When run finetune.sh, train.py shows an error "in preprocess_plain assert len(source) == 2" #815

Closed Jiushanhuadao closed 11 months ago

Jiushanhuadao commented 11 months ago

Question

I print the source in sources, it shows as below: [{'from': 'human', 'value': '\nWhat types of signs are in the image?'}, {'from': 'gpt', 'value': 'The image contains several wooden street signs at an intersection, with some of them giving directions to various destinations. There is also a street sign next to a parking lot with a FILA store, and a group of signs posted on a post.'}, {'from': 'human', 'value': 'How are the signs directing people in the picture?'}, {'from': 'gpt', 'value': 'The signs in the picture direct people by providing information about street names, distances, destinations or points of interest, and by guiding them towards specific locations or facilities like a parking lot or a FILA store at that location.'}, {'from': 'human', 'value': 'What is the purpose of these signs?'}, {'from': 'gpt', 'value': 'The purpose of these signs is to provide guidance and necessary information to pedestrians, drivers, and other people navigating the area. They help individuals find their way to specific places, understand road rules and regulations, and ensure a smooth and orderly flow of traffic. They also contribute to the accessibility and readability of a city or town, making it easier for residents and tourists alike to explore and navigate the surroundings.'}, {'from': 'human', 'value': 'What kind of store is the FILA store?'}, {'from': 'gpt', 'value': "The FILA store is a retail outlet for the FILA brand, which is a sportswear company known for its footwear, clothing, and accessories. Established in Italy in 1911, FILA has become a popular brand worldwide for its sports and lifestyle products. The store typically sells athletic shoes, sports clothing, and accessories for various activities such as running, tennis, basketball, and training, as well as casual wear like t-shirts, hoodies, and sweatpants that showcase the brand's distinct style and logo."}]

as the code in train.py: def preprocess_plain( sources: Sequence[str], tokenizer: transformers.PreTrainedTokenizer, ) -> Dict:

add end signal and concatenate together

conversations = []
for source in sources:
    # sft
    print(source)
    print(len(sources))
    print(len(source))
    print("-------")
    # source=source[0]
    # print(source)
    assert len(source) == 2
    assert DEFAULT_IMAGE_TOKEN in source[0]['value']
    source[0]['value'] = DEFAULT_IMAGE_TOKEN
    conversation = source[0]['value'] + source[1]['value'] + conversation_lib.default_conversation.sep
    conversations.append(conversation)
# tokenize conversations
input_ids = [tokenizer_image_token(prompt, tokenizer, return_tensors='pt') for prompt in conversations]
targets = copy.deepcopy(input_ids)
for target, source in zip(targets, sources):
    tokenized_len = len(tokenizer_image_token(source[0]['value'], tokenizer))
    target[:tokenized_len] = IGNORE_INDEX
return dict(input_ids=input_ids, labels=targets)

the source's length is always not as 2, is it a wrong of my json file? I use llava_v1_5_mix665k.json as input.

anonymous-atom commented 11 months ago

@Jiushanhuadao Can you let me know how you resolved this ? I am getting this error assert DEFAULT_IMAGE_TOKEN in source[0]['value']

anonymous-atom commented 11 months ago

Now I am getting the same error as you assert len(source) == 2 AssertionError

Jiushanhuadao commented 11 months ago

Now I am getting the same error as you assert len(source) == 2 AssertionError

I got the error because I use the old version finetune.sh and use a error json which is not suitable for the llava1.5-finetune.