marella / ctransformers

Python bindings for the Transformer models implemented in C/C++ using GGML library.
MIT License
1.82k stars 138 forks source link

Config #32

Closed vmajor closed 1 year ago

vmajor commented 1 year ago

How do I specify model() parameters, from here: https://github.com/marella/ctransformers#config

Placing them inside model() does not raise an error, but they are ignored for my mpt model ie. I can enter whatever I want, output does not change at all.

Am I supposed to enter them under kwargs as a list? Is there an example somewhere?

EDIT: I changed gnenerate() to model(). I had generate() on my mind from another question, but it is the model parameters that I am trying to set

vmajor commented 1 year ago

Code:

import ctransformers
from transformers import AutoTokenizer
name = '/home/*****/models/mpt-30B-instruct-GGML/mpt-30b-instruct.ggmlv0.q8_0.bin'
#config = ctransformers.hub.AutoConfig(name)

model = ctransformers.AutoModelForCausalLM.from_pretrained(
  name,
  model_type='mpt',
  top_k=40,
  top_p=0.1,
  temperature=0.7,
  repetition_penalty=1.18,
  last_n_tokens=64,
  seed=123,
  batch_size=64,
  context_length=8192,
  max_new_tokens=300
)

context = "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n###Instruction\n"
prompt = "Read this carefully, and reflect on your answer before you give it: David has three sisters. How many brothers does each sister have?\n"
output = "### Response\n"
formatted_prompt = context + prompt + output

print(model(formatted_prompt))
vmajor commented 1 year ago

Never mind, this does not appear to be ctransformers issue, it is just that my mpt seems to react to settings in a different way to llama derivatives. It requires larger jumps, and seems to be the most sensitive to top_p values.