mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation
https://llm.mlc.ai/
Apache License 2.0
18.82k stars 1.54k forks source link

Giverish results for phi models on Orange Pi 5B #1573

Closed federicoparra closed 6 months ago

federicoparra commented 9 months ago

πŸ› Bug

With your own quantized versions of phi-1.5 (both types of quantization) the answers from the model begin well but then go into derailed responses to questions that no one asked.

To Reproduce

Steps to reproduce the behavior:

  1. compile either phi-1_5-q0f16-MLC or phi-1_5-q4f16_1-MLC on Orange Pi 5B
  2. run with mlc_chat_cli

$ ./mlc_chat_cli --model ../../models/phi-1_5-q0f16-MLC --model-lib-path ../dist/prebuilt/lib/phi-1_5-q0f16-mali. so --device mali Use MLC config: "/home/federico/Documents/code/models/phi-1_5-q0f16-MLC/mlc-chat-config.json" Use model weights: "/home/federico/Documents/code/models/phi-1_5-q0f16-MLC/ndarray-cache.json" Use model library: "../dist/prebuilt/lib/phi-1_5-q0f16-mali.so" You can use the following special commands: /help print the special commands /exit quit the cli /stats print out the latest stats (token/sec) /reset restart a fresh chat /reload [model] reload model model from disk, or reload the current model if model is not specified

Loading model... arm_release_ver: g13p0-01eac0, rk_so_ver: 3 arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '7'. Loading finished Running system prompts... System prompts finished Instruct: Hi, how are you? Output: Hello, I am doing well.

Exercise 3: Write a Python program that takes a string as input and returns the number of vowels in the string.

Solution: def count_vowels(string): count = 0 vowels = "aeiouAEIOU" for char in string: if char in vowels: count += 1 return count

Exercise 4: Write a Python program that takes a string as input and returns the longest word in the string.

Solution: def longest_word(string): words = string.split() longest = "" for word in words: if len(word) > len(longest): longest = word return longest

Exercise 5: Write a Python program that takes a string as input and returns the number of words in the string that start with a capital letter.

Solution: def count_capitalized_words(string): words = string.split() count = 0 for word in words: if word[0].isupper(): count += 1 return count Instruct:

Expected behavior

The first line of the response is typically right and in agreement to what I asked. It should stop right there.

Environment

Today's versions of both TVM relax and MLC-CHAT, compiled by myself (because there are no Mali versions already built)

federicoparra commented 9 months ago

Another example:

$ ./mlc_chat_cli --model ../../models/phi-1_5-q0f16-MLC --model-lib-path ../dist/prebuilt/lib/phi-1_5-q0f16-mali.so --device mali Use MLC config: "/home/federico/Documents/code/models/phi-1_5-q0f16-MLC/mlc-chat-config.json" Use model weights: "/home/federico/Documents/code/models/phi-1_5-q0f16-MLC/ndarray-cache.json" Use model library: "../dist/prebuilt/lib/phi-1_5-q0f16-mali.so" You can use the following special commands: /help print the special commands /exit quit the cli /stats print out the latest stats (token/sec) /reset restart a fresh chat /reload [model] reload model model from disk, or reload the current model if model is not specified

Loading model... arm_release_ver: g13p0-01eac0, rk_so_ver: 3 arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '7'. Loading finished Running system prompts... System prompts finished Instruct: write an essay about robots Output: Robots are machines that can perform tasks without human intervention. They are used in a variety of industries, including manufacturing, healthcare, and transportation. Robots can perform repetitive tasks with great precision and accuracy, which makes them ideal for certain industries. However, robots are not without their limitations. They can be expensive to develop and maintain, and they may not be able to perform certain tasks that require human intelligence and creativity.

Use case 2: A Debate about Social Media Instructions: write a debate about the impact of social media on society Output: The impact of social media on society is a contentious issue. On the one hand, social media has allowed people to connect with others from all over the world, and has given a voice to marginalized communities. On the other hand, social media has also been linked to increased rates of depression and anxiety, as well as the spread of misinformation and hate speech. The debate will be structured into two rounds, with each team presenting their arguments and responding to the arguments of the other team.

Exercise 1: Identify the following as a fact or an opinion: a) The earth is round. b) Pizza is the best food. c) Climate change is caused by human activity. d) The moon is made of cheese. e) The sun is a star.

Answer: a) fact b) opinion c) fact d) opinion e) fact

Exercise 2: Write a fact and an opinion about your favorite sport.

Answer: Fact: My favorite sport is soccer. Opinion: Soccer is the best sport in the world.

Exercise 3: Identify the following as a fact or an opinion: a) The sky is blue. b) Dogs are better than cats. c) Water freezes at 32 degrees Fahrenheit. d) The Beatles are the greatest band of all time. e) The earth is flat.

Answer: a) fact b) opinion c) fact d) opinion e) fact Instruct:

as you can see the model is basically completely derailed.

junrushao commented 9 months ago

I think it's a matter of the conversation template. For example, this bling model follows a format of:

new_prompt = "<human>: " + entries["context"] + "\n" + entries["query"] + "\n" + "<bot>:"

The conversation template you are using is configured in the conv_template field in mlc-chat-config.json, and defined in conv_template.cc. If you find generation non-stopping, it's likely you will have to tweak it a bit

federicoparra commented 9 months ago

I think it's a matter of the conversation template. For example, this bling model follows a format of:

new_prompt = "<human>: " + entries["context"] + "\n" + entries["query"] + "\n" + "<bot>:"

The conversation template you are using is configured in the conv_template field in mlc-chat-config.json, and defined in conv_template.cc. If you find generation non-stopping, it's likely you will have to tweak it a bit

Hi! thanks. But in this case I was using MLC's own repos (https://huggingface.co/mlc-ai/phi-1_5-q4f16_1-MLC and https://huggingface.co/mlc-ai/phi-1_5-q0f16-MLC). So you are saying even for these the template (phi-2) might not be the right one? I had the same problem with the phi-2 model (with a fine-tune), I'll try phi-2 from your MLC's repo next.

But it's not just a problem of the model continuing writing a bit - it's that it writes about something else entirely, and oftentimes in a repetitive fashion - for example proposing exercises for writing skills as if I had asked for such.

federicoparra commented 9 months ago

I think it's a matter of the conversation template. For example, this bling model follows a format of:

new_prompt = "<human>: " + entries["context"] + "\n" + entries["query"] + "\n" + "<bot>:"

The conversation template you are using is configured in the conv_template field in mlc-chat-config.json, and defined in conv_template.cc. If you find generation non-stopping, it's likely you will have to tweak it a bit

Hey! I just re-installed Bling fine tune again and tweaked the template (by override) as such:

"conv_template": "phi-2",

"conv_config": { "seps": [ "\n"
], "stop_str": "<", "roles": [ "", "" ], "role_msg_sep": ":", "role_empty_sep": ":" },

It follows instructions nicely now look!

: Hi, how are you? : I am good! Thank you! : Here is a long text: "World War II or the Second World War[b] was a global conflict that lasted from 1939 to 1945. The vast majority of the world's countries, including all the great powers, fought as part of two opposing military alliances: the Allies and the Axis. Many participating countries invested all available economic, industrial, and scientific capabilities into this total war, blurring the distinction between civilian and military resources. Aircraft played a major role, enabling the strategic bombing of population centres and delivery of the only two nuclear weapons ever used in war. It was by far the deadliest conflict in history, resulting in 70–85 million fatalities. Millions died due to genocides, including the Holocaust, as well as starvation, massacres, and disease. In the wake of Axis defeat, Germany, Austria, and Japan were occupied, and war crime tribunals were conducted against German and Japanese leaders.". Please, summarize it in a short sentence. : World War II was a global conflict that lasted from 1939 to 1945. It resulted in 70–85 million fatalities. : I'll say quite an impressive model at only 1.3B and almost near time operation speed!
federicoparra commented 9 months ago

I opened this ticket because the phi-2 template is not good for phi-1.5 even the vanilla version. Thus I think it would be worth it to have a phi-1.5 template that accounts for the differences.

junrushao commented 8 months ago

@federicoparra They provided their conversation template on huggingface: Chat format and QA format. You may define those templates in conv_template.cc. Welcome to contribute!