viniciusarruda / llama-cpp-chat-completion-wrapper

Wrapper around llama-cpp-python for chat completion with LLaMA v2 models.
25 stars 1 forks source link

Cutoff after 135 tokens #3

Closed Yud07 closed 1 year ago

Yud07 commented 1 year ago

Anyone know why it keeps stopping after replying with 135 tokens? I've played with the parameters but haven't had any luck extending the replies.

OUTPUT: (llamacpppy) C:\Users\mlgmyudy\convo_chain>python gpt-4-api69.py LlamaCppChatCompletion.init() Llama2ChatCompletionWrapper.init() llama.cpp: loading model from ../llama.cpp/models/ggml-llama-2-13b-chat-q4_0.bin llama_model_load_internal: format = ggjt v3 (latest) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 512 llama_model_load_internal: n_embd = 5120 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 40 llama_model_load_internal: n_layer = 40 llama_model_load_internal: n_rot = 128 llama_model_load_internal: freq_base = 10000.0 llama_model_load_internal: freq_scale = 1 llama_model_load_internal: ftype = 2 (mostly Q4_0) llama_model_load_internal: n_ff = 13824 llama_model_load_internal: model size = 13B llama_model_load_internal: ggml ctx size = 0.09 MB llama_model_load_internal: mem required = 8860.00 MB (+ 1608.00 MB per state) llama_new_context_with_model: kv self size = 400.00 MB AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | new_session() Chat history file not found. Starting new chat.

speak Speaking mode ON What can you tell me about downtown Santa Cruz, CA? 1 messages in context for 22 tokens.

_call() from LlamaCppChatCompletion call from Llama2ChatCompletionWrapper class messages within llama2CCW: [{'role': 'system', 'content': "Assistant is Llama 2, the world's most advanced open source Large Language Model and was created by Meta researchers. Assistant always answers extremely concisely."}, {'role': 'user', 'content': 'What can you tell me about downtown Santa Cruz, CA?'}] tokens_in_reply: 135 Downtown Santa Cruz, CA is a vibrant and eclectic area known for its unique blend of art, culture, and natural beauty. Here are some key points:

  1. Beaches: Downtown Santa Cruz is located just a few blocks from the beach, offering easy access to sun, sand, and surf.
  2. Boardwalk: The iconic Santa Cruz Beach Boardwalk is a must-visit attraction, featuring classic amusement park rides, games, and delicious seafood.
  3. Shopping: Downtown Santa Cruz offers a variety of.

    Please continue 3 messages in context for 167 tokens.

_call() from LlamaCppChatCompletion call from Llama2ChatCompletionWrapper class messages within llama2CCW: [{'role': 'system', 'content': "Assistant is Llama 2, the world's most advanced open source Large Language Model and was created by Meta researchers. Assistant always answers extremely concisely."}, {'role': 'user', 'content': 'What can you tell me about downtown Santa Cruz, CA?'}, {'role': 'assistant', 'content': 'Downtown Santa Cruz, CA is a vibrant and eclectic area known for its unique blend of art, culture, and natural beauty. Here are some key points:\n\n1. Beaches: Downtown Santa Cruz is located just a few blocks from the beach, offering easy access to sun, sand, and surf.\n2. Boardwalk: The iconic Santa Cruz Beach Boardwalk is a must-visit attraction, featuring classic amusement park rides, games, and delicious seafood.\n3. Shopping: Downtown Santa Cruz offers a variety of'}, {'role': 'user', 'content': 'Please continue'}] Llama.generate: prefix-match hit tokens_in_reply: 135 Sure! Here are some more points about downtown Santa Cruz:

  1. Local boutiques: The area is home to many unique local boutiques, offering everything from clothing and accessories to gifts and souvenirs.
  2. Art scene: Downtown Santa Cruz has a thriving arts scene, with numerous galleries and studios showcasing the work of local artists.
  3. Cultural events: Throughout the year, downtown Santa Cruz hosts a variety of cultural events, including festivals, concerts, and street fairs.
  4. Dining: The area offers.

    Explain how nuclear fission works 5 messages in context for 317 tokens.

_call() from LlamaCppChatCompletion call from Llama2ChatCompletionWrapper class messages within llama2CCW: [{'role': 'system', 'content': "Assistant is Llama 2, the world's most advanced open source Large Language Model and was created by Meta researchers. Assistant always answers extremely concisely."}, {'role': 'user', 'content': 'What can you tell me about downtown Santa Cruz, CA?'}, {'role': 'assistant', 'content': 'Downtown Santa Cruz, CA is a vibrant and eclectic area known for its unique blend of art, culture, and natural beauty. Here are some key points:\n\n1. Beaches: Downtown Santa Cruz is located just a few blocks from the beach, offering easy access to sun, sand, and surf.\n2. Boardwalk: The iconic Santa Cruz Beach Boardwalk is a must-visit attraction, featuring classic amusement park rides, games, and delicious seafood.\n3. Shopping: Downtown Santa Cruz offers a variety of'}, {'role': 'user', 'content': 'Please continue'}, {'role': 'assistant', 'content': 'Sure! Here are some more points about downtown Santa Cruz:\n4. Local boutiques: The area is home to many unique local boutiques, offering everything from clothing and accessories to gifts and souvenirs.\n5. Art scene: Downtown Santa Cruz has a thriving arts scene, with numerous galleries and studios showcasing the work of local artists.\n6. Cultural events: Throughout the year, downtown Santa Cruz hosts a variety of cultural events, including festivals, concerts, and street fairs.\n7. Dining: The area offers'}, {'role': 'user', 'content': 'Explain how nuclear fission works'}] Llama.generate: prefix-match hit tokens_in_reply: 135 Nuclear fission is a process in which an atomic nucleus splits into two or more smaller nuclei, releasing a large amount of energy in the process. This process is the opposite of nuclear fusion, where two or more nuclei combine to form a single, heavier nucleus. The most commonly used fissile materials are uranium-235 (U-235) and plutonium-239 (Pu-239). These isotopes can undergo a chain reaction, where the neutrons released by one fission event cause subsequent f.

Please continue 7 messages in context for 462 tokens.

_call() from LlamaCppChatCompletion call from Llama2ChatCompletionWrapper class messages within llama2CCW: [{'role': 'system', 'content': "Assistant is Llama 2, the world's most advanced open source Large Language Model and was created by Meta researchers. Assistant always answers extremely concisely."}, {'role': 'user', 'content': 'What can you tell me about downtown Santa Cruz, CA?'}, {'role': 'assistant', 'content': 'Downtown Santa Cruz, CA is a vibrant and eclectic area known for its unique blend of art, culture, and natural beauty. Here are some key points:\n\n1. Beaches: Downtown Santa Cruz is located just a few blocks from the beach, offering easy access to sun, sand, and surf.\n2. Boardwalk: The iconic Santa Cruz Beach Boardwalk is a must-visit attraction, featuring classic amusement park rides, games, and delicious seafood.\n3. Shopping: Downtown Santa Cruz offers a variety of'}, {'role': 'user', 'content': 'Please continue'}, {'role': 'assistant', 'content': 'Sure! Here are some more points about downtown Santa Cruz:\n4. Local boutiques: The area is home to many unique local boutiques, offering everything from clothing and accessories to gifts and souvenirs.\n5. Art scene: Downtown Santa Cruz has a thriving arts scene, with numerous galleries and studios showcasing the work of local artists.\n6. Cultural events: Throughout the year, downtown Santa Cruz hosts a variety of cultural events, including festivals, concerts, and street fairs.\n7. Dining: The area offers'}, {'role': 'user', 'content': 'Explain how nuclear fission works'}, {'role': 'assistant', 'content': 'Nuclear fission is a process in which an atomic nucleus splits into two or more smaller nuclei, releasing a large amount of energy in the process. This process is the opposite of nuclear fusion, where two or more nuclei combine to form a single, heavier nucleus.\nThe most commonly used fissile materials are uranium-235 (U-235) and plutonium-239 (Pu-239). These isotopes can undergo a chain reaction, where the neutrons released by one fission event cause subsequent f'}, {'role': 'user', 'content': 'Please continue'}] Llama.generate: prefix-match hit tokens_in_reply: 25 Sure! Here's the rest of the explanation: In a nuclear fission.

How did Oppenheimer contribute to the science of nuclear physics specifically? What did he do personally re: science. 9 messages in context for 518 tokens.

_call() from LlamaCppChatCompletion call from Llama2ChatCompletionWrapper class messages within llama2CCW: [{'role': 'system', 'content': "Assistant is Llama 2, the world's most advanced open source Large Language Model and was created by Meta researchers. Assistant always answers extremely concisely."}, {'role': 'user', 'content': 'What can you tell me about downtown Santa Cruz, CA?'}, {'role': 'assistant', 'content': 'Downtown Santa Cruz, CA is a vibrant and eclectic area known for its unique blend of art, culture, and natural beauty. Here are some key points:\n\n1. Beaches: Downtown Santa Cruz is located just a few blocks from the beach, offering easy access to sun, sand, and surf.\n2. Boardwalk: The iconic Santa Cruz Beach Boardwalk is a must-visit attraction, featuring classic amusement park rides, games, and delicious seafood.\n3. Shopping: Downtown Santa Cruz offers a variety of'}, {'role': 'user', 'content': 'Please continue'}, {'role': 'assistant', 'content': 'Sure! Here are some more points about downtown Santa Cruz:\n4. Local boutiques: The area is home to many unique local boutiques, offering everything from clothing and accessories to gifts and souvenirs.\n5. Art scene: Downtown Santa Cruz has a thriving arts scene, with numerous galleries and studios showcasing the work of local artists.\n6. Cultural events: Throughout the year, downtown Santa Cruz hosts a variety of cultural events, including festivals, concerts, and street fairs.\n7. Dining: The area offers'}, {'role': 'user', 'content': 'Explain how nuclear fission works'}, {'role': 'assistant', 'content': 'Nuclear fission is a process in which an atomic nucleus splits into two or more smaller nuclei, releasing a large amount of energy in the process. This process is the opposite of nuclear fusion, where two or more nuclei combine to form a single, heavier nucleus.\nThe most commonly used fissile materials are uranium-235 (U-235) and plutonium-239 (Pu-239). These isotopes can undergo a chain reaction, where the neutrons released by one fission event cause subsequent f'}, {'role': 'user', 'content': 'Please continue'}, {'role': 'assistant', 'content': "Sure! Here's the rest of the explanation:\nIn a nuclear fission"}, {'role': 'user', 'content': 'How did Oppenheimer contribute to the science of nuclear physics specifically? What did he do personally re: science.'}] Llama.generate: prefix-match hit Traceback (most recent call last): File "C:\Users\mlgmyudy\convo_chain\gpt-4-api69.py", line 741, in run_terminal_interface("", "llama", chat_history_file) File "C:\Users\mlgmyudy\convo_chain\gpt-4-api69.py", line 642, in run_terminal_interface response_text = chat._call(context[-1].content) File "C:\Users\mlgmyudy\convo_chain\gpt-4-api69.py", line 434, in _call return self.chat_wrapper(message, params=params, **kwargs) File "C:\Users\mlgmyudy\llama-cpp-chat-completion-wrapper\llama_cpp_chat_completion_wrapper.py", line 121, in call for i, token in enumerate(completion): File "C:\Users\mlgmyudy\Anaconda3\envs\llamacpppy\lib\site-packages\llama_cpp\llama.py", line 713, in generate self.eval(tokens) File "C:\Users\mlgmyudy\Anaconda3\envs\llamacpppy\lib\site-packages\llama_cpp\llama.py", line 463, in eval self.input_ids[self.n_tokens : self.n_tokens + n_tokens] = batch ValueError: could not broadcast input array from shape (33,) into shape (0,)

(llamacpppy) C:\Users\mlgmyudy\convo_chain>

Yud07 commented 1 year ago

Llama generate doesn't take max_tokens as a parameter...

File "C:\Users\mlgmyudy\llama-cpp-chat-completion-wrapper\llama_cpp_chat_completion_wrapper.py", line 116, in call completion = self.llm.generate(messages_tokens, **params) TypeError: Llama.generate() got an unexpected keyword argument 'max_tokens'

viniciusarruda commented 1 year ago

You probably need to increase the n_ctx.

Yud07 commented 1 year ago

Yeah

In llama_cpp_chat_completion_wrapper.py:

class Llama2ChatCompletionWrapper: def init(self, model_path: str, callback: Callable[[Message], None] = None, tokenizer_encoder: Callable = None) -> None: print("Llama2ChatCompletionWrapper.init()") self.llm = Llama(model_path=model_path,n_ctx=4096) ... ... def call( self, message: str, post_process: Callable[[str], str] | None = None, max_tokens: int = -1, params: dict = {} #max_tokens: int = 128 ) -> str:

viniciusarruda commented 1 year ago

Did it solve?