marella / ctransformers

Python bindings for the Transformer models implemented in C/C++ using GGML library.
MIT License
1.81k stars 137 forks source link

Add support for "stop" in config #3

Closed hippalectryon-0 closed 1 year ago

hippalectryon-0 commented 1 year ago

(In the readme at least) the config passed in CTransformers doesn't accept stop strings, which is a common feature.

bluecoconut commented 1 year ago

+1 I'd like this as well

Right now I made my own eval method (taken inspiriation off of the call method) https://github.com/marella/ctransformers/blob/d05a4d0702c72c028870e4fe5d4f37bf73d7b243/ctransformers/llm.py#L263

something like this is what im doing

tokens = self.tokenize(prompt)
stop = genkwargs.pop("stop", None) or []
if isinstance(stop, str):
    stop = [stop]
end_ids = [self.model.tokenize(x) for x in stop]

def should_stop(response_tokens):
    for end in end_ids:
        if all(x == y for x, y in zip(response_tokens[-len(end):], end)):
            return True
    if len(response_tokens) >= max_new_tokens:
        return True
    return False

response = []
for token in self.generate(tokens, **genkwargs):
    response.append(token)
    if should_stop(response):
        break

I agree with @hippalectryon-0 It'd be nice if it were built in~

marella commented 1 year ago

Thanks for the suggestion. I will add it in the next release.

marella commented 1 year ago

Added stop option in the latest release 0.1.1

In core library, you can use:

llm = AutoModelForCausalLM.from_pretrained(...)

llm(prompt, stop=['foo', 'bar'])

In LangChain, you can use:

config = {'stop': ['foo', 'bar']}

llm = CTransformers(..., config=config)

@hippalectryon-0 in the issue referenced above: by streaming if you mean callback API then it is already supported:

from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

llm = CTransformers(..., callbacks=[StreamingStdOutCallbackHandler()])

Please feel free to open another issue if you are looking for a different kind of stream API.