model_kwargs are not used by the model 'token_type_ids'

Thank for the nice demo, it took me a bit to get it going. I suggest adding:

tokenizer_outputs_to_remove=["token_type_ids"] to app.py llm = HuggingFaceLLM(context_window=4096, max_new_tokens=256, system_prompt=system_prompt, query_wrapper_prompt=query_wrapper_prompt, model=model, tokenizer=tokenizer, tokenizer_kwargs={"max_length": 4096}, tokenizer_outputs_to_remove=["token_type_ids"] )

and max_length=xxxx to the Jupyter notebook: output = model.generate(input_ids=inputs['input_ids'], attention_mask=inputs['attention_mask'], streamer=streamer, use_cache=True, max_length=225)

Adding these will still allow to run the demo

nicknochnack / Llama2RAG

model_kwargs are not used by the model 'token_type_ids' #6