Open ssantos97 opened 2 months ago
We recommend trying Llama2 Chat for your use case. If you run into any more issues or have other questions, feel free to reach out to us.
Another thing. What is the formula for the accuracy in your paper? And the paper related to it?
We use the evaluation metric proposed by https://github.com/mbzuai-oryx/Video-ChatGPT
Just one more thing, could you provide examples of evaluating code on moviechat 1k?
Please see https://github.com/rese1f/MovieChat/tree/main/eval_code for details
Thanks. Another thing, you instanciate llama2 7B chat model and then load checkpoint from video llama 2 7B finetuned. But the later includes weights from visual encoder and q former. How are they compatible? Aditionally what's the purpose of using llama2 7B chat if we then we load the fine tuned version of video llama?
q former weight of VideoLLama is suitable for both llama and llama2.
But where are q former weights used in llama 2 7B chat if this instantiated model does not contain the q former in its architecture? Could you explain?
Q former is used in VideoLLaMA, not for llama2. You can refer to the code of MovieChat and VideoLLaMA for details.
Which setup do you use for your experiments? llama_model: "ckpt/llama2/llama-2-7b-chat-hf" or llama_model: "ckpt/moviechat_llama7b" with ckpt: "ckpt/VL_LLaMA_2_7B_Finetuned.pth" or ckpt: "ckpt/finetune-vicuna7b-v2.pth"? Because I can't get similar results with your experiments with llama 2 7b chat and VL_LLaMa. If you use vicuna how do you get llama original weights? They are not available anymore.
And what is moviechat_llama7b?
we use llama_model: "ckpt/moviechat_llama7b" and ckpt: "ckpt/finetune-vicuna7b-v2.pth". moviechat_llama7b is the vicuna used in MovieChat
Ok, I use the original llama 1 weights to merge with vicuna-7b-delta-v0 with the apply_delta function then I get Vicuna/7B folder which is used in llama_model as ckpt/Vicuna/7B. In ckpt I use finetune-vicuna7b-v2.pth but still getting weird outputs. What am I doing wrong? Thank you
Did you try moviechat_llama7b we provided in HuggingFace? It is the apply_delta version
you mean this link https://huggingface.co/Enxin/MovieChat-vicuna?
sure
I keep getting the same weird outputs. It's weird because with Llama-2-7b-chat-hf and VL_LLaMA_2_7B_Finetuned.pth it works
SOLVED - For future reference: llama_model should also be changed in MovieChat/configs/models/moviechat.yaml to the apply_delta provided by you.
Could you provide the code for evaluating the consistency metric? Including how you use two different questions for assessment in the prompt and which questions. It would be super helpful as I want to do a fair comparison with your methods.
Thanks
why does some outputs look like this:
I'm using llama2 with vl llama2 and not llama2 chat. Is that the reason?