haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
20.2k stars 2.23k forks source link

[Usage] 7B model has an abnormal output for some images #1513

Open shiyuleixia opened 5 months ago

shiyuleixia commented 5 months ago

Describe the issue

Issue: When I use the 7B model to predict ,some images just predict bad results and the results return by model is very slow Command:

the prompt is :Analyze the image and list 10 environmental tags that describe visible elements such as natural scenery, architectural features, and lighting conditions. Tags should include elements like types of vegetation, specific buildings, or weather conditions. Deliver the tags in a single line, separated by commas. Avoid any references to people, crowds, personal attributes or animals.

Log:

PASTE THE LOGS HERE.

Screenshots: You may attach screenshots if it better explains the issue. 852f925c3d8a6a992fd8ac78bca5eb21

it outputs like this: Stone building, stone wall, stone steps, stone window sills, stone window frames, stone window, stone door, stone archway, stone pillar, stone walkway, stone ground, stone sidewalk, stone street, stone building facade, stone building corner, stone building entrance, stone building door, stone building window, stone building window frame, stone building window sill, stone building window pane, stone building window panes, stone building window, stone building window frame, stone building window sill, stone building window pane, stone building window panes, stone building window, stone building window frame, stone building window sill, stone building window pane, stone building window panes, stone building window, stone building window frame, stone building window sill, stone building window pane, stone building window panes, stone building window, stone building window frame, stone building window sill, stone building window pane, stone building window panes, stone building window, stone building window frame, stone building window sill, stone building window pane, stone building window panes, stone building window, stone building window frame, stone building window sill, stone building window pane, stone building window panes, stone building window, stone building window frame, stone building window sill, stone building window pane, stone building window panes, stone building window, stone building window frame, stone building window sill, stone building window pane, stone building window panes, stone building window, stone building window frame, stone building window sill, stone building window pane, stone building window panes, stone building window, stone building window frame, stone building window sill, stone building window pane, stone building window panes, stone building window, stone building window frame, stone building window sill, stone building window pane, stone building window panes, stone building window, stone building window frame, stone building window sill, stone building window pane, stone building window panes, stone building window, stone building window frame, stone building window sill, stone building window pane, stone building window panes, stone building window, stone building window frame, stone building window sill, stone building window pane, stone building window panes, stone building window, stone building window frame, stone building window sill, stone building window pane, stone building window panes, stone building window, stone building window frame, stone building window sill, stone building

ProGamerGov commented 5 months ago

@shiyuleixia My research team and I discovered that this issue is the result of greedy search in the model (it happens in every other model as well). Luckily it can be detected and resolved relatively easily, as you can see here: https://github.com/ProGamerGov/VLM-Captioning-Tools