nicknochnack / Llama2RAG

A working example of RAG using LLama 2 70b and Llama Index
352 stars 110 forks source link

model_kwargs are not used by the model 'token_type_ids' #6

Open johanteekens opened 9 months ago

johanteekens commented 9 months ago

Thank for the nice demo, it took me a bit to get it going. I suggest adding:

tokenizer_outputs_to_remove=["token_type_ids"] to app.py llm = HuggingFaceLLM(context_window=4096, max_new_tokens=256, system_prompt=system_prompt, query_wrapper_prompt=query_wrapper_prompt, model=model, tokenizer=tokenizer, tokenizer_kwargs={"max_length": 4096}, tokenizer_outputs_to_remove=["token_type_ids"] )

and max_length=xxxx to the Jupyter notebook: output = model.generate(input_ids=inputs['input_ids'], attention_mask=inputs['attention_mask'], streamer=streamer, use_cache=True, max_length=225)

Adding these will still allow to run the demo