Truncation of generated results

lalulxm commented 3 months ago

Thank you for your great work! I tried to inference with my own model based on llava1.5 fine-tuning plus OPERA's method, but about 1/3 of the results of the inferencing appear to be truncated early. I followed the method in demo.ipynb and changed Line 14 of eval_configs/llava-1.5_eval.yaml to the path of the fine-tuned model. With beam=5 and max_new_tokens=1024, but the inference results showed the following early truncation:

{
    "question_id": 26, 
    "image": "test_imgs/0318_14.jpeg", 
    "text": "Please analyze the relationship between these animals.", 
    "type": "Relation reasoning", 
    "caption": "In the image, a white bird is perched on the back of a gray elePHant. This unique situation suggests a certain level of tolerance and possibly even symbiotic relationship between the two animals. It is possible that the elephat and th"
}

How can I solve this problem?

shikiw commented 3 months ago

Thanks for your interest!

When early truncation appears, it is recommended to reduce penalty_weights like penalty_weights=0.1, or reduce length_penalty in generate args. These will help to avoid early truncation in beam search.

lalulxm commented 3 months ago

Thank you for your early reply! I have tried to modify the penalty_weights and length_penalty parameters, and the part of the code that calls the generate function is shown below:

out = model.generate(
                    {"image": norm(image), "prompt": qu},
                    use_nucleus_sampling=args.sample,
                    num_beams=5,
                    max_new_tokens=4096,
                    output_attentions=True,
                    length_penalty=0.1,
                    opera_decoding=True,
                    scale_factor=50,
                    threshold=15,
                    num_attn_candidates=5,
                    penalty_weights=0.1,
                )

but the re-inference still results in 1/5 truncations.

shikiw commented 3 months ago

Maybe we should also reduce the scale_factor, like:

out = model.generate(
                    {"image": norm(image), "prompt": qu},
                    use_nucleus_sampling=args.sample,
                    num_beams=5,
                    max_new_tokens=4096,
                    output_attentions=True,
                    length_penalty=0.1,
                    opera_decoding=True,
                    scale_factor=20,
                    threshold=15,
                    num_attn_candidates=5,
                    penalty_weights=0.1,
                )

Hope it works well :)

shikiw / OPERA

Truncation of generated results #27