turboderp / exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
MIT License
2.74k stars 215 forks source link

Add truncation warning #169

Open vadi2 opened 1 year ago

vadi2 commented 1 year ago

Add truncation warning, as it can be kind of rough to find this through trial and error or by noticing the context numbers.

turboderp commented 1 year ago

I don't think this is the right approach. Truncation is now a regular part of how the context window is adjusted, so it would spam hundreds of lines in the console. I've been meaning to add a feature to the UI that will visually indicate how much of the history is being used as context, but I haven't gotten around to it, and the front-end code probably needs to be cleaned up a bunch first.

vadi2 commented 1 year ago

That approach would work just as well. Some kind of a return notification on the API would be nice too!