turboderp / exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
MIT License
2.67k stars 214 forks source link

Custom multiple stop token (for roleplay / conversation) #248

Closed wangerzi closed 10 months ago

wangerzi commented 10 months ago

Based on role-playing needs, multiple stop token checks are implemented in example_flask.py

No Stop Word: image

Stop when "###end": image

turboderp commented 10 months ago

I don't really want to keep overloading that one function with more and more features like this. E.g. this stop parameter would stop all sequences in a batch, and it's not guaranteed to catch all possible encodings of the given stop strings either.

But, I will be adding something like this probably later today, supporting multiple stop tokens and text strings as stop conditions, as well as streaming and context reuse. I think it will be a separate generator so as not to break too much.

Kerushii commented 10 months ago

I have updated example_ws to support custom stop token, oneshot as well as streaming and even token estimation. Please use ws if you can flask api only offers limited advantage when the resource is tight/needs to fulfil the request within its context. Most modern languages support ws. Additionally the ws example should be essentially faster than flask api due to cache reuse

vadi2 commented 10 months ago

My 2c is that I'd like to see both http and websockets supported as first citizens, as http has an arguably lower bar to get started with.

wangerzi commented 10 months ago

My 2c is that I'd like to see both http and websockets supported as first citizens, as http has an arguably lower bar to get started with.

@vadi2 I agree with you, it's more useful in pre-testing phase.👍

As the author said, my code currently has a lot of problems, I will close this PR, and look forward to a better stop conditions implementation by turboderp.🆙🆙🆙

turboderp commented 10 months ago

Have a look at the latest commit, let me know if it's useful.

vadi2 commented 10 months ago

At a scan, it looks really good! Thank you.