haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
18.38k stars 2.01k forks source link

Clarification on Max Output Tokens in LLaVA-1.6 Models #1095

Closed ConMan05 closed 5 months ago

ConMan05 commented 5 months ago

Question

I am currently working on a project that involves extracting information from technical diagrams. During my exploration of the codebase, I encountered references to parameters related to the maximum output tokens. I seek clarification on the specific roles of these parameters and their impact on the model's behavior.

Codebase References:

Questions:

  1. What is the distinction between model_max_length and max_new_tokens?
  2. Which of these parameters controls the maximum output tokens during inference?

Additional Query: I am interested in increasing the default maximum output tokens (above 8k) to better suit the requirements of my project. Could you provide guidance on how to achieve this, if feasible?

haotian-liu commented 5 months ago

model_max_length determines the max number of tokens a model can process and this includes system message, instruction and any response it generates.

max_new_tokens determines how many tokens it generates. The model will stop either it has generated an "End-of-Sequence" token or reaches that length.

Increasing the context length -- there are related research in NLP working on that. Also, you can switch the base LLM that supports longer context length as well :)

ConMan05 commented 5 months ago

Understood, so max_new_tokens is equivalent to max_output_tokens.

1)Could you provide information on the specific value of max_new_tokens for the liuhaotian/llava-v1.6-34b and liuhaotian/llava-v1.6-mistral-7b ?

2)I have tested mistral-7b-instruct-v0.2 on the Replicate website, and it successfully generated up to 8000 tokens during testing. Does this imply that the llava-v1.6-mistral-7b, can also produce up to 8000 tokens?

3)Moreover, when setting max_new_tokens to 8000, will this have any notable impact on the output or performance of the model?

Thanks for your reply @haotian-liu

haotian-liu commented 5 months ago

Mistral instruct was trained to gracefully handle 32K tokens, while we haven't trained multimodal reasoning on that length -- that requires the model to "generalize" when generating any response above 4K.

Conceptually, as long as the total tokens are within 4K, it would be fine, so exist_tokens + max_new_tokens < 4K is the golden rule.

samueleruffino99 commented 3 months ago

Hello, I am using liuhaotian/llava-v1.5-7b (4-bit version) and I am getting this warning:

Token indices sequence length is longer than the specified maximum sequence length for this model (4789 > 2048). Running this sequence through the model will result in indexing errors

I have computed some info from a perception model and I am trying to embed this information into text (I both tried to append this info to system message or directly to USER message). Anyway, both are really slow and Do you have any suggestion on how to deal with this problem (maybe the best practice to embed this into the MLLM). I have 8GB memory so i am quite constraint.

Moreover, which conv-mode I should cjhoose to properly use v1.6-mistral-7b locally? Thank you so much!!