Getting Long text generation after fine tuning Mistral 7b Model

Rishita32 commented 9 months ago

System Info

Hi, I am fine tuning Mistral7b model. I am getting long automated text generation using the fine tuned model. I have kept the eos_token=True. Can someone please tell me how to add a word limit to the responses?

This is the code for initializing tokenizer: base_model = "mistralai/Mistral-7B-v0.1" bnb_config = BitsAndBytesConfig( load_in_4bit= True, bnb_4bit_quant_type= "nf4", bnb_4bit_compute_dtype= torch.bfloat16, bnb_4bit_use_double_quant= False, ) model = AutoModelForCausalLM.from_pretrained( base_model, load_in_4bit=True, quantization_config=bnb_config, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True, ) model.config.use_cache = False # silence the warnings. Please re-enable for inference! model.config.pretraining_tp = 1 model.gradient_checkpointing_enable()

Load tokenizer

tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True) tokenizer.padding_side = 'right' tokenizer.pad_token = tokenizer.unk_token tokenizer.add_eos_token = True tokenizer.max_length = 200 tokenizer.truncation = True

Who can help?

No response

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

base_model = "mistralai/Mistral-7B-v0.1" bnb_config = BitsAndBytesConfig( load_in_4bit= True, bnb_4bit_quant_type= "nf4", bnb_4bit_compute_dtype= torch.bfloat16, bnb_4bit_use_double_quant= False, ) model = AutoModelForCausalLM.from_pretrained( base_model, load_in_4bit=True, quantization_config=bnb_config, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True, ) model.config.use_cache = False # silence the warnings. Please re-enable for inference! model.config.pretraining_tp = 1 model.gradient_checkpointing_enable()

Load tokenizer

tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True) tokenizer.padding_side = 'right' tokenizer.pad_token = tokenizer.unk_token tokenizer.add_eos_token = True tokenizer.max_length = 200 tokenizer.truncation = True

Expected behavior

Looking for a solution to avoid long text generation.

amyeroberts commented 9 months ago

Hi, thanks for raising an issue!

This is a question best placed in our forums. We try to reserve the github issues for feature requests and bug reports.

General comments:

Setting add_eos_token instructs the tokenizer to add an EOS token at the end of a sequence of tokens but don't control the length.
Looking through the code base, this flag is only used for Llama, CLVP and code llama.
When generating, you can't set a word limit, but you can set a limit on the number of tokens generated by passing max_new_tokens. You can read the generate docs here and here.

github-actions[bot] commented 8 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / transformers