mlvlab / Flipped-VQA

Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)
https://ikodoh.github.io/flipped_vqa_demo.html
MIT License
62 stars 7 forks source link

About the tokenizer!!! #20

Open Yuzuriha-Inori-x opened 5 days ago

Yuzuriha-Inori-x commented 5 days ago

Hi, I want to ask, what are the values ​​of self.v_token_id = 15167, self.q_token_id = 16492, self.a_token_id = 22550, self.nl_id = 13 in tokenizer set based on? Or why is the value of v_token_id set to 15167?

ikodoh commented 3 days ago

Hi,

Thank you for your interest in our work. v_token_id, q_token_id, a_token_id, and nl_id stand for the tokenized id of the word 'Video', 'Question', 'Answer', and '\n' (next line), respectively. I set those tokens to identify which positions to start the generation. Also, those token ids are set based on LLaMA Tokenizer, so if you want to use a different LLM, you have to change those token ids according to the LLM.