Open jiangerxiaozhao2 opened 1 week ago
Yea, the problem is the model is always predicting "Throughout the entire video." at the first turn and terminates the recursive grounding process.
Why is there a CUDA out of memory error traceback in your log in line 102-119? Maybe the model is not correctly loaded?
I ran the code again and there was no "out of memory" error, but the result was the same. charades_sta-recursive_grounding-4_turns.log
After debugging the program, I found that the prediction for each video is “llm_message = '('”.
Here is my recursive grounding log: charades_sta-recursive_grounding-4_turns.log
I compared my recursive grounding log (which works as expected) with yours. The main difference is the bos_token
and eos_token
used:
For hawkeye model we use 0 as bos_token_id
and 1 for eos_token_id
. However, in your recursive log you used 1 as bos_token_id
and 2 for eos_token _id
. This may cause the model to generate nothing as it wrongly identifies the bos_token
as the eos_token
. You can try to reset the token_ids in LlamaConfig
manually by adding following code after this line and see if it works:
https://github.com/yellow-binary-tree/HawkEye/blob/635b1c50e80d4dec16cb2bd0e89ed08971b7c7e0/models/hawkeye_it.py#L112
llama_config.bos_token_id = 0
llama_config.eos_token_id = 1
As for how this problem arises, all special tokens of Hawkeye were inherited from vicuna-v0, which were further inherited from LLaMA (v1). I can vaguely recall that when llama was first released in early 2023, its special token was once inconsistent in different released versions and caused some confusion at that time.
Please let me know if this solves your problem so I can fix this bug.
Great work! I got strange results. Could you help me?
../outputs/charades_sta-recursive_grounding-4_turns.jsonl num examples: 3720 turns: 1 mean iou: 0.2698 iou@0.3/0.5/0.7: 0.3430/0.0022/0.0000 turns: 2 mean iou: 0.2698 iou@0.3/0.5/0.7: 0.3430/0.0022/0.0000 turns: 3 mean iou: 0.2698 iou@0.3/0.5/0.7: 0.3430/0.0022/0.0000 turns: 4 mean iou: 0.2698 iou@0.3/0.5/0.7: 0.3430/0.0022/0.0000
The log file is: charades_sta-recursive_grounding-4_turns.log
Only change to current code is https://github.com/yellow-binary-tree/HawkEye/blob/635b1c50e80d4dec16cb2bd0e89ed08971b7c7e0/models/blip2/blip2.py#L28 local_files_only=True->False for download necessary files