marella / ctransformers

Python bindings for the Transformer models implemented in C/C++ using GGML library.
MIT License
1.75k stars 137 forks source link

May not even be a Transformers issue.. WizardLM-Uncensored-Falcon-40 #86

Open linuxmagic-mp opened 11 months ago

linuxmagic-mp commented 11 months ago

Just could use some feedback on debugging with ctransformers, have a strange case where things are generally working, but occasionally I don't get output... using /models/WizardLM-Uncensored-Falcon-40b/ggml-model-falcon-40b-wizardlm-qt_k5.bin (GGML)

tokens = llm.tokenize('I want to give you a female name.  What is your favourite female names, give me your top five.  And a preference on what you preferred to be called.')

for token in llm.generate(tokens):
    print(llm.detokenize(token))

works always..

print(llm('I want to give you a female name.  What is your favourite female names, give me your top five.  And a preference on what you preferred to be called.'))

Sometimes there is NO output.

Scratching my head on how to debug this?

marella commented 11 months ago

llm(...) doesn't return until the entire text is generated whereas llm.generate(...) sends tokens one-by-one as they get generated. Is it exiting without error and without printing anything? Try using stream=True:

for text in llm(prompt, stream=True):
    print(text)