intel / intel-extension-for-transformers

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
Apache License 2.0
2.05k stars 200 forks source link

question about configuration #1643

Open menglin0320 opened 4 days ago

menglin0320 commented 4 days ago

In the examples you guys didn't mention how to specify parameters like batch size, max input length etc. My first question is how to change the max input length, I tried the llama2 example for a RAG usage case. llama2 should be able to handle 4096 input tokens but it's limited to 1024 for some reason. Similarly though I don't feel batching is a good idea on cpu, I still want to try batched inference with this package. is there a document for how to configure those things?

menglin0320 commented 4 days ago

after trying mistral out, yeah you guys limit the ctx length to 1024 for every model.

a32543254 commented 13 hours ago

could you tell us which example you are using?